2023-11-25 20:20:52,055 INFO [train_asr.py:1303] (2/4) Training started 2023-11-25 20:20:52,056 INFO [train_asr.py:1313] (2/4) Device: cuda:2 2023-11-25 20:20:52,078 INFO [train_asr.py:1325] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'a9ea720f-dirty', 'icefall-git-date': 'Wed Nov 22 17:48:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-1125112954-6d844cbdd8-m6xmg', 'IP address': '10.177.94.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 39, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'stop_early': False, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'beats_label': True, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-25 20:20:52,079 INFO [train_asr.py:1334] (2/4) About to create model 2023-11-25 20:20:52,758 INFO [train_asr.py:1338] (2/4) Number of model parameters: 65819362 2023-11-25 20:20:52,759 INFO [train_asr.py:1362] (2/4) Using CED labels! 2023-11-25 20:20:52,759 INFO [checkpoint.py:112] (2/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-38.pt 2023-11-25 20:20:56,553 INFO [train_asr.py:1370] (2/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-25 20:20:59,847 INFO [train_asr.py:1379] (2/4) Using DDP 2023-11-25 20:21:00,335 INFO [train_asr.py:1402] (2/4) Loading optimizer state dict 2023-11-25 20:21:00,837 INFO [train_asr.py:1410] (2/4) Loading scheduler state dict 2023-11-25 20:21:00,877 INFO [train_asr.py:1432] (2/4) Getting audioset cuts 2023-11-25 20:21:00,877 INFO [kd_datamodule.py:784] (2/4) About to get the audioset cuts. 2023-11-25 20:21:00,964 INFO [train_asr.py:1438] (2/4) Using mux to combine Librispeech with audioset 2023-11-25 20:21:00,964 INFO [train_asr.py:1449] (2/4) CutSet(len=2748469) [underlying data type: ] 2023-11-25 20:21:10,052 INFO [kd_datamodule.py:396] (2/4) Enable MUSAN 2023-11-25 20:21:10,052 INFO [kd_datamodule.py:397] (2/4) About to get Musan cuts 2023-11-25 20:21:12,648 INFO [kd_datamodule.py:427] (2/4) Enable SpecAugment 2023-11-25 20:21:12,648 INFO [kd_datamodule.py:428] (2/4) Time warp factor: 80 2023-11-25 20:21:12,648 INFO [kd_datamodule.py:438] (2/4) Num frame mask: 10 2023-11-25 20:21:12,648 INFO [kd_datamodule.py:451] (2/4) About to create train dataset 2023-11-25 20:21:12,649 INFO [kd_datamodule.py:487] (2/4) Using SimpleCutSampler 2023-11-25 20:21:12,649 INFO [kd_datamodule.py:495] (2/4) About to create train dataloader 2023-11-25 20:21:12,652 INFO [kd_datamodule.py:802] (2/4) About to get the audioset eval cuts. 2023-11-25 20:21:12,653 INFO [train_asr.py:1513] (2/4) CutSet(len=20681) [underlying data type: ] 2023-11-25 20:21:12,708 INFO [kd_datamodule.py:529] (2/4) About to create dev dataset 2023-11-25 20:21:13,148 INFO [kd_datamodule.py:550] (2/4) About to create dev dataloader 2023-11-25 20:21:13,149 INFO [train_asr.py:1527] (2/4) Loading grad scaler state dict 2023-11-25 20:21:48,357 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 0, loss[loss=0.1221, simple_loss=0.07771, pruned_loss=0.008435, audio_tagging_loss=0.07477, over 15299.00 frames. ], tot_loss[loss=0.1221, simple_loss=0.07771, pruned_loss=0.008435, audio_tagging_loss=0.07477, over 15299.00 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:21:48,357 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-25 20:22:20,741 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.127, simple_loss=0.05083, pruned_loss=0.005243, audio_tagging_loss=0.09629, over 4681554.00 frames. 2023-11-25 20:22:20,742 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-25 20:22:28,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3046020.0, ans=0.0 2023-11-25 20:22:41,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3046086.6666666665, ans=0.0 2023-11-25 20:23:10,926 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-25 20:23:14,415 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=32.73 vs. limit=22.5 2023-11-25 20:23:16,270 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 50, loss[loss=0.1227, simple_loss=0.1417, pruned_loss=0.02876, audio_tagging_loss=0.02308, over 15310.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.09219, pruned_loss=0.01288, audio_tagging_loss=0.04188, over 685981.29 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:23:16,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3046353.3333333335, ans=0.0 2023-11-25 20:23:28,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3046420.0, ans=0.1 2023-11-25 20:23:38,199 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:23:39,123 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.865e+01 9.525e+01 1.035e+02 1.246e+02 6.272e+02, threshold=2.069e+02, percent-clipped=17.0 2023-11-25 20:23:43,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3046486.6666666665, ans=0.125 2023-11-25 20:23:47,175 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2023-11-25 20:23:47,814 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:23:52,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3046553.3333333335, ans=0.5 2023-11-25 20:24:00,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3046620.0, ans=0.1 2023-11-25 20:24:06,630 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-25 20:24:12,185 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 100, loss[loss=0.08796, simple_loss=0.1041, pruned_loss=0.01222, audio_tagging_loss=0.02369, over 16062.00 frames. ], tot_loss[loss=0.09319, simple_loss=0.09036, pruned_loss=0.01264, audio_tagging_loss=0.03537, over 1208058.76 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:24:13,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3046686.6666666665, ans=0.125 2023-11-25 20:24:34,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046820.0, ans=0.1 2023-11-25 20:24:40,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3046820.0, ans=0.0 2023-11-25 20:24:42,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3046886.6666666665, ans=0.5 2023-11-25 20:24:46,401 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2023-11-25 20:24:47,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3046886.6666666665, ans=0.0 2023-11-25 20:24:58,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-11-25 20:25:00,691 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-25 20:25:02,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3046953.3333333335, ans=0.0 2023-11-25 20:25:05,938 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 150, loss[loss=0.08924, simple_loss=0.1034, pruned_loss=0.02117, audio_tagging_loss=0.01639, over 14670.00 frames. ], tot_loss[loss=0.08689, simple_loss=0.09017, pruned_loss=0.0126, audio_tagging_loss=0.02921, over 1614530.96 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:25:28,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.752e+01 9.435e+01 1.031e+02 1.991e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-25 20:25:31,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2023-11-25 20:25:55,157 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-25 20:26:00,407 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 200, loss[loss=0.07517, simple_loss=0.09724, pruned_loss=0.01306, audio_tagging_loss=0.01349, over 16290.00 frames. ], tot_loss[loss=0.08204, simple_loss=0.09069, pruned_loss=0.01281, audio_tagging_loss=0.02388, over 1935536.91 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:26:08,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.29 vs. limit=10.0 2023-11-25 20:26:43,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3047620.0, ans=0.125 2023-11-25 20:26:44,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3047620.0, ans=0.125 2023-11-25 20:26:50,228 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-25 20:26:51,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3047620.0, ans=0.2 2023-11-25 20:26:53,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2023-11-25 20:26:55,837 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 250, loss[loss=0.07168, simple_loss=0.09132, pruned_loss=0.01664, audio_tagging_loss=0.009379, over 14672.00 frames. ], tot_loss[loss=0.07817, simple_loss=0.0904, pruned_loss=0.01271, audio_tagging_loss=0.02026, over 2180284.07 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:27:01,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3047686.6666666665, ans=0.05 2023-11-25 20:27:16,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.232e+01 9.778e+01 1.082e+02 1.251e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-25 20:27:26,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3047886.6666666665, ans=0.09899494936611666 2023-11-25 20:27:35,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-11-25 20:27:44,426 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-25 20:27:50,100 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 300, loss[loss=0.05412, simple_loss=0.06071, pruned_loss=0.01045, audio_tagging_loss=0.01331, over 15012.00 frames. ], tot_loss[loss=0.07541, simple_loss=0.09002, pruned_loss=0.01275, audio_tagging_loss=0.01765, over 2374051.94 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:27:52,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.90 vs. limit=22.5 2023-11-25 20:28:03,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3048086.6666666665, ans=0.125 2023-11-25 20:28:04,900 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:28:13,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3048153.3333333335, ans=0.0 2023-11-25 20:28:20,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3048153.3333333335, ans=0.125 2023-11-25 20:28:25,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3048220.0, ans=0.0 2023-11-25 20:28:26,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3048220.0, ans=0.0 2023-11-25 20:28:32,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3048286.6666666665, ans=0.125 2023-11-25 20:28:35,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3048286.6666666665, ans=0.125 2023-11-25 20:28:36,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3048286.6666666665, ans=0.1 2023-11-25 20:28:38,384 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-25 20:28:43,511 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 350, loss[loss=0.06098, simple_loss=0.07382, pruned_loss=0.01148, audio_tagging_loss=0.01259, over 14794.00 frames. ], tot_loss[loss=0.07392, simple_loss=0.0904, pruned_loss=0.01272, audio_tagging_loss=0.016, over 2520952.46 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:28:48,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3048353.3333333335, ans=0.1 2023-11-25 20:28:55,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=12.0 2023-11-25 20:28:58,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3048420.0, ans=0.0 2023-11-25 20:29:05,849 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.941e+01 9.403e+01 1.014e+02 1.528e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-25 20:29:06,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-25 20:29:10,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-11-25 20:29:12,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3048486.6666666665, ans=0.2 2023-11-25 20:29:31,904 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-25 20:29:38,102 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 400, loss[loss=0.04928, simple_loss=0.05486, pruned_loss=0.009597, audio_tagging_loss=0.01225, over 14786.00 frames. ], tot_loss[loss=0.07372, simple_loss=0.09201, pruned_loss=0.0131, audio_tagging_loss=0.01462, over 2641419.22 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:29:45,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3048686.6666666665, ans=0.0 2023-11-25 20:29:59,122 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=22.5 2023-11-25 20:30:18,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3048886.6666666665, ans=0.125 2023-11-25 20:30:21,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3048953.3333333335, ans=0.125 2023-11-25 20:30:25,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3048953.3333333335, ans=0.2 2023-11-25 20:30:26,456 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-25 20:30:32,072 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 450, loss[loss=0.06676, simple_loss=0.09437, pruned_loss=0.01248, audio_tagging_loss=0.007104, over 15044.00 frames. ], tot_loss[loss=0.07214, simple_loss=0.09112, pruned_loss=0.01299, audio_tagging_loss=0.01358, over 2723781.80 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:30:34,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3049020.0, ans=0.125 2023-11-25 20:30:40,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2023-11-25 20:30:52,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.642e+01 9.445e+01 1.011e+02 1.527e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-25 20:30:56,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3049153.3333333335, ans=0.0 2023-11-25 20:30:58,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3049153.3333333335, ans=0.1 2023-11-25 20:31:20,671 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-25 20:31:26,193 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 500, loss[loss=0.07712, simple_loss=0.1011, pruned_loss=0.01596, audio_tagging_loss=0.01063, over 13470.00 frames. ], tot_loss[loss=0.07108, simple_loss=0.09042, pruned_loss=0.01297, audio_tagging_loss=0.0129, over 2800505.92 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:31:42,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-11-25 20:31:46,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3049420.0, ans=0.125 2023-11-25 20:31:54,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=22.5 2023-11-25 20:32:03,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3049553.3333333335, ans=0.0 2023-11-25 20:32:03,510 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=12.0 2023-11-25 20:32:05,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2023-11-25 20:32:14,242 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-25 20:32:16,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3049620.0, ans=0.2 2023-11-25 20:32:20,601 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 550, loss[loss=0.05893, simple_loss=0.07409, pruned_loss=0.008907, audio_tagging_loss=0.01298, over 16123.00 frames. ], tot_loss[loss=0.07028, simple_loss=0.08977, pruned_loss=0.01294, audio_tagging_loss=0.01246, over 2854282.35 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:32:24,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3049686.6666666665, ans=0.125 2023-11-25 20:32:34,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-25 20:32:42,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.908e+01 9.696e+01 1.036e+02 1.301e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-25 20:32:45,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3049820.0, ans=0.125 2023-11-25 20:32:53,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-25 20:33:08,799 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-25 20:33:11,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-11-25 20:33:13,975 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 600, loss[loss=0.05675, simple_loss=0.06994, pruned_loss=0.009518, audio_tagging_loss=0.01227, over 15105.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.08883, pruned_loss=0.01268, audio_tagging_loss=0.01201, over 2900703.91 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:33:18,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3050020.0, ans=0.0 2023-11-25 20:33:32,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3050086.6666666665, ans=0.0 2023-11-25 20:34:02,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457550 2023-11-25 20:34:03,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3050286.6666666665, ans=0.0 2023-11-25 20:34:07,282 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:34:08,199 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 650, loss[loss=0.07554, simple_loss=0.09495, pruned_loss=0.0145, audio_tagging_loss=0.01356, over 15733.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.09024, pruned_loss=0.01293, audio_tagging_loss=0.01168, over 2936293.31 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:34:12,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3050353.3333333335, ans=0.1 2023-11-25 20:34:16,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-25 20:34:22,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3050420.0, ans=0.0 2023-11-25 20:34:28,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3050420.0, ans=10.0 2023-11-25 20:34:29,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3050486.6666666665, ans=0.5 2023-11-25 20:34:31,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.926e+01 9.451e+01 1.004e+02 1.811e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-25 20:34:35,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3050486.6666666665, ans=0.125 2023-11-25 20:34:56,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457600 2023-11-25 20:35:02,819 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 700, loss[loss=0.05896, simple_loss=0.08326, pruned_loss=0.00754, audio_tagging_loss=0.009793, over 16839.00 frames. ], tot_loss[loss=0.06926, simple_loss=0.09006, pruned_loss=0.01286, audio_tagging_loss=0.01137, over 2967223.13 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:35:12,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3050753.3333333335, ans=0.0 2023-11-25 20:35:33,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3050820.0, ans=0.2 2023-11-25 20:35:39,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3050886.6666666665, ans=0.0 2023-11-25 20:35:52,104 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457650 2023-11-25 20:35:53,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3050953.3333333335, ans=0.125 2023-11-25 20:35:57,272 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 750, loss[loss=0.05796, simple_loss=0.06846, pruned_loss=0.01077, audio_tagging_loss=0.01296, over 14600.00 frames. ], tot_loss[loss=0.06895, simple_loss=0.08993, pruned_loss=0.01281, audio_tagging_loss=0.01118, over 2986450.03 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:36:11,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3051086.6666666665, ans=0.0 2023-11-25 20:36:17,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3051153.3333333335, ans=0.0 2023-11-25 20:36:19,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.878e+01 9.438e+01 1.006e+02 1.228e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 20:36:38,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3051220.0, ans=0.125 2023-11-25 20:36:42,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3051286.6666666665, ans=0.0 2023-11-25 20:36:45,748 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457700 2023-11-25 20:36:51,356 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 800, loss[loss=0.06648, simple_loss=0.09039, pruned_loss=0.009299, audio_tagging_loss=0.01199, over 16595.00 frames. ], tot_loss[loss=0.06945, simple_loss=0.09114, pruned_loss=0.01291, audio_tagging_loss=0.01097, over 2998583.11 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:36:52,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3051353.3333333335, ans=0.0 2023-11-25 20:36:58,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3051353.3333333335, ans=0.09899494936611666 2023-11-25 20:36:59,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2023-11-25 20:37:05,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2023-11-25 20:37:13,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3051486.6666666665, ans=0.1 2023-11-25 20:37:21,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3051486.6666666665, ans=0.125 2023-11-25 20:37:24,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3051553.3333333335, ans=0.125 2023-11-25 20:37:25,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3051553.3333333335, ans=0.125 2023-11-25 20:37:35,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3051620.0, ans=0.0 2023-11-25 20:37:40,164 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457750 2023-11-25 20:37:45,296 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 850, loss[loss=0.05237, simple_loss=0.06554, pruned_loss=0.007732, audio_tagging_loss=0.01187, over 14347.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.09138, pruned_loss=0.01278, audio_tagging_loss=0.01083, over 3011797.53 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:38:08,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.812e+01 9.277e+01 9.996e+01 1.418e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-25 20:38:27,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3051886.6666666665, ans=0.1 2023-11-25 20:38:29,506 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:38:35,682 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457800 2023-11-25 20:38:41,241 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 900, loss[loss=0.0719, simple_loss=0.09998, pruned_loss=0.01179, audio_tagging_loss=0.01012, over 15051.00 frames. ], tot_loss[loss=0.06875, simple_loss=0.09053, pruned_loss=0.01275, audio_tagging_loss=0.01074, over 3017978.99 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:38:45,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3052020.0, ans=0.0 2023-11-25 20:39:02,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3052153.3333333335, ans=0.125 2023-11-25 20:39:27,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3052286.6666666665, ans=0.95 2023-11-25 20:39:29,901 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457850 2023-11-25 20:39:35,112 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 950, loss[loss=0.051, simple_loss=0.0638, pruned_loss=0.009233, audio_tagging_loss=0.009861, over 15517.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.09072, pruned_loss=0.01285, audio_tagging_loss=0.01048, over 3018939.77 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:39:46,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3052420.0, ans=0.125 2023-11-25 20:39:49,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052420.0, ans=0.1 2023-11-25 20:39:51,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3052420.0, ans=0.125 2023-11-25 20:39:58,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.948e+01 9.425e+01 1.001e+02 1.201e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-25 20:40:01,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-25 20:40:20,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3052620.0, ans=0.125 2023-11-25 20:40:23,497 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457900 2023-11-25 20:40:23,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3052620.0, ans=0.2 2023-11-25 20:40:29,261 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1000, loss[loss=0.06985, simple_loss=0.08746, pruned_loss=0.01687, audio_tagging_loss=0.009254, over 15651.00 frames. ], tot_loss[loss=0.068, simple_loss=0.08989, pruned_loss=0.01278, audio_tagging_loss=0.01028, over 3021697.26 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:40:52,822 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:40:53,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3052820.0, ans=0.0 2023-11-25 20:41:19,392 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457950 2023-11-25 20:41:21,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-25 20:41:24,730 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1050, loss[loss=0.06241, simple_loss=0.07622, pruned_loss=0.01433, audio_tagging_loss=0.009975, over 14486.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.08958, pruned_loss=0.01276, audio_tagging_loss=0.01012, over 3021647.50 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:41:24,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3053020.0, ans=0.2 2023-11-25 20:41:26,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2023-11-25 20:41:30,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3053020.0, ans=0.1 2023-11-25 20:41:31,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3053020.0, ans=0.0 2023-11-25 20:41:48,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.789e+01 9.440e+01 1.020e+02 1.231e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 20:41:58,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3053220.0, ans=0.0 2023-11-25 20:42:05,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3053220.0, ans=0.125 2023-11-25 20:42:07,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3053286.6666666665, ans=0.2 2023-11-25 20:42:11,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-25 20:42:13,596 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458000 2023-11-25 20:42:13,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3053286.6666666665, ans=0.2 2023-11-25 20:42:16,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-11-25 20:42:19,191 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1100, loss[loss=0.0671, simple_loss=0.0936, pruned_loss=0.01359, audio_tagging_loss=0.006702, over 14938.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.08914, pruned_loss=0.01261, audio_tagging_loss=0.009987, over 3023542.22 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 20:42:21,269 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:42:21,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3053353.3333333335, ans=0.2 2023-11-25 20:42:28,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=22.5 2023-11-25 20:42:35,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3053420.0, ans=0.125 2023-11-25 20:42:38,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3053420.0, ans=0.0 2023-11-25 20:42:49,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3053486.6666666665, ans=0.125 2023-11-25 20:42:49,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3053486.6666666665, ans=0.07 2023-11-25 20:43:03,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3053620.0, ans=0.09899494936611666 2023-11-25 20:43:04,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=15.0 2023-11-25 20:43:07,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458050 2023-11-25 20:43:12,714 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1150, loss[loss=0.0611, simple_loss=0.07813, pruned_loss=0.01396, audio_tagging_loss=0.008069, over 14098.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.08905, pruned_loss=0.01259, audio_tagging_loss=0.009921, over 3023715.92 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 20:43:15,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3053686.6666666665, ans=0.125 2023-11-25 20:43:17,845 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-25 20:43:19,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3053686.6666666665, ans=0.09899494936611666 2023-11-25 20:43:25,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-25 20:43:38,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.733e+01 9.355e+01 1.009e+02 1.328e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 20:43:42,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3053820.0, ans=0.125 2023-11-25 20:43:49,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-25 20:43:56,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3053953.3333333335, ans=0.125 2023-11-25 20:43:59,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3053953.3333333335, ans=0.0 2023-11-25 20:44:02,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458100 2023-11-25 20:44:07,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3053953.3333333335, ans=0.125 2023-11-25 20:44:08,910 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1200, loss[loss=0.06886, simple_loss=0.09615, pruned_loss=0.01134, audio_tagging_loss=0.00944, over 15916.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.08926, pruned_loss=0.01258, audio_tagging_loss=0.009739, over 3025769.01 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:44:11,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3054020.0, ans=0.125 2023-11-25 20:44:19,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3054086.6666666665, ans=0.125 2023-11-25 20:44:34,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3054153.3333333335, ans=0.125 2023-11-25 20:44:56,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3054286.6666666665, ans=0.1 2023-11-25 20:44:57,719 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458150 2023-11-25 20:45:01,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3054286.6666666665, ans=0.125 2023-11-25 20:45:02,905 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1250, loss[loss=0.05524, simple_loss=0.07516, pruned_loss=0.006582, audio_tagging_loss=0.01108, over 15299.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.08864, pruned_loss=0.01259, audio_tagging_loss=0.00979, over 3027016.40 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:45:07,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3054353.3333333335, ans=0.125 2023-11-25 20:45:19,761 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:45:21,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3054420.0, ans=0.125 2023-11-25 20:45:27,922 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 9.172e+01 9.748e+01 1.035e+02 1.279e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-25 20:45:35,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3054553.3333333335, ans=0.125 2023-11-25 20:45:42,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.97 vs. limit=22.5 2023-11-25 20:45:42,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3054553.3333333335, ans=22.5 2023-11-25 20:45:51,560 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458200 2023-11-25 20:45:57,093 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1300, loss[loss=0.05429, simple_loss=0.07931, pruned_loss=0.007153, audio_tagging_loss=0.007481, over 14812.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08845, pruned_loss=0.01253, audio_tagging_loss=0.009724, over 3019838.64 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:46:02,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3054686.6666666665, ans=0.05 2023-11-25 20:46:05,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3054686.6666666665, ans=0.125 2023-11-25 20:46:06,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3054753.3333333335, ans=0.1 2023-11-25 20:46:15,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3054753.3333333335, ans=0.125 2023-11-25 20:46:28,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3054820.0, ans=0.125 2023-11-25 20:46:31,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-25 20:46:32,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3054886.6666666665, ans=0.125 2023-11-25 20:46:35,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3054886.6666666665, ans=0.125 2023-11-25 20:46:35,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-25 20:46:45,792 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458250 2023-11-25 20:46:52,082 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1350, loss[loss=0.072, simple_loss=0.1007, pruned_loss=0.01366, audio_tagging_loss=0.007994, over 15435.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.08952, pruned_loss=0.01272, audio_tagging_loss=0.009633, over 3024376.44 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:46:56,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3055020.0, ans=0.125 2023-11-25 20:47:01,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3055020.0, ans=0.2 2023-11-25 20:47:13,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3055153.3333333335, ans=0.125 2023-11-25 20:47:16,591 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.731e+01 9.396e+01 1.007e+02 1.248e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 20:47:23,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3055220.0, ans=0.0 2023-11-25 20:47:31,803 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:47:41,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458300 2023-11-25 20:47:42,659 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-11-25 20:47:42,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-25 20:47:46,331 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1400, loss[loss=0.05823, simple_loss=0.08368, pruned_loss=0.006254, audio_tagging_loss=0.01014, over 15463.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.0901, pruned_loss=0.01276, audio_tagging_loss=0.009622, over 3029591.66 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:48:04,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3055420.0, ans=0.1 2023-11-25 20:48:22,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3055553.3333333335, ans=0.1 2023-11-25 20:48:22,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3055553.3333333335, ans=0.125 2023-11-25 20:48:22,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3055553.3333333335, ans=0.125 2023-11-25 20:48:25,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3055553.3333333335, ans=0.0 2023-11-25 20:48:35,020 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458350 2023-11-25 20:48:39,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3055686.6666666665, ans=0.125 2023-11-25 20:48:40,186 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1450, loss[loss=0.06144, simple_loss=0.08033, pruned_loss=0.009386, audio_tagging_loss=0.01189, over 15736.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09022, pruned_loss=0.01274, audio_tagging_loss=0.009665, over 3035114.47 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:48:59,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3055753.3333333335, ans=0.025 2023-11-25 20:49:01,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3055820.0, ans=0.125 2023-11-25 20:49:05,883 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.671e+01 9.348e+01 1.019e+02 1.564e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 20:49:14,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3055886.6666666665, ans=0.05 2023-11-25 20:49:21,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3055886.6666666665, ans=0.125 2023-11-25 20:49:28,710 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458400 2023-11-25 20:49:34,812 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1500, loss[loss=0.06216, simple_loss=0.08103, pruned_loss=0.01132, audio_tagging_loss=0.01033, over 15455.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.08979, pruned_loss=0.01266, audio_tagging_loss=0.009812, over 3039811.51 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:49:51,274 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2023-11-25 20:49:55,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3056086.6666666665, ans=0.125 2023-11-25 20:50:06,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-25 20:50:25,377 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458450 2023-11-25 20:50:28,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-25 20:50:30,451 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1550, loss[loss=0.0866, simple_loss=0.1293, pruned_loss=0.01428, audio_tagging_loss=0.007669, over 16000.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09008, pruned_loss=0.01263, audio_tagging_loss=0.009783, over 3038285.61 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:50:54,345 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.789e+01 9.295e+01 1.002e+02 1.264e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 20:51:19,481 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458500 2023-11-25 20:51:24,711 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1600, loss[loss=0.05327, simple_loss=0.07056, pruned_loss=0.008974, audio_tagging_loss=0.009017, over 13830.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09071, pruned_loss=0.01277, audio_tagging_loss=0.009806, over 3048204.76 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:51:36,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-25 20:51:46,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2023-11-25 20:52:05,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3056886.6666666665, ans=0.125 2023-11-25 20:52:13,779 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458550 2023-11-25 20:52:18,985 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1650, loss[loss=0.05685, simple_loss=0.07762, pruned_loss=0.009346, audio_tagging_loss=0.008696, over 16141.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.08935, pruned_loss=0.01239, audio_tagging_loss=0.009963, over 3050292.54 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:52:27,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3057020.0, ans=0.125 2023-11-25 20:52:30,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3057086.6666666665, ans=0.0 2023-11-25 20:52:34,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3057086.6666666665, ans=0.125 2023-11-25 20:52:44,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3057153.3333333335, ans=0.1 2023-11-25 20:52:44,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.829e+01 9.450e+01 1.011e+02 1.260e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-25 20:53:02,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3057286.6666666665, ans=10.0 2023-11-25 20:53:03,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3057286.6666666665, ans=0.125 2023-11-25 20:53:09,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458600 2023-11-25 20:53:15,265 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1700, loss[loss=0.05906, simple_loss=0.07763, pruned_loss=0.01139, audio_tagging_loss=0.00885, over 13873.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.08901, pruned_loss=0.01226, audio_tagging_loss=0.01001, over 3050788.22 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:53:35,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3057486.6666666665, ans=0.2 2023-11-25 20:53:42,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3057486.6666666665, ans=0.125 2023-11-25 20:53:46,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3057553.3333333335, ans=0.125 2023-11-25 20:54:05,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458650 2023-11-25 20:54:10,161 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1750, loss[loss=0.08667, simple_loss=0.1282, pruned_loss=0.01583, audio_tagging_loss=0.006733, over 15572.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08899, pruned_loss=0.01227, audio_tagging_loss=0.009799, over 3049944.68 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:54:14,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3057686.6666666665, ans=0.125 2023-11-25 20:54:14,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3057686.6666666665, ans=0.2 2023-11-25 20:54:20,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3057753.3333333335, ans=0.2 2023-11-25 20:54:35,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3057820.0, ans=0.1 2023-11-25 20:54:36,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.599e+01 9.189e+01 9.882e+01 1.189e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-25 20:54:56,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3057953.3333333335, ans=0.125 2023-11-25 20:54:57,685 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2023-11-25 20:54:59,275 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458700 2023-11-25 20:55:02,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3057953.3333333335, ans=0.0 2023-11-25 20:55:04,393 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1800, loss[loss=0.08061, simple_loss=0.1158, pruned_loss=0.01426, audio_tagging_loss=0.008457, over 15291.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08966, pruned_loss=0.01227, audio_tagging_loss=0.009666, over 3044254.54 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:55:27,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3058153.3333333335, ans=0.0 2023-11-25 20:55:28,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3058153.3333333335, ans=0.125 2023-11-25 20:55:29,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3058153.3333333335, ans=0.0 2023-11-25 20:55:47,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3058286.6666666665, ans=0.0 2023-11-25 20:55:54,782 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458750 2023-11-25 20:56:00,481 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1850, loss[loss=0.06862, simple_loss=0.08656, pruned_loss=0.01482, audio_tagging_loss=0.01052, over 14995.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.08953, pruned_loss=0.01234, audio_tagging_loss=0.009605, over 3048494.86 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:56:14,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-11-25 20:56:26,236 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.557e+01 9.640e+01 1.041e+02 1.665e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-25 20:56:26,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3058486.6666666665, ans=0.125 2023-11-25 20:56:49,856 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458800 2023-11-25 20:56:55,927 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1900, loss[loss=0.05765, simple_loss=0.08268, pruned_loss=0.008178, audio_tagging_loss=0.008129, over 17430.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.08997, pruned_loss=0.0124, audio_tagging_loss=0.009486, over 3055991.92 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:56:59,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3058686.6666666665, ans=0.0 2023-11-25 20:57:08,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=15.0 2023-11-25 20:57:25,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-25 20:57:26,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3058820.0, ans=0.0 2023-11-25 20:57:38,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3058953.3333333335, ans=0.2 2023-11-25 20:57:44,942 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458850 2023-11-25 20:57:50,129 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1950, loss[loss=0.08696, simple_loss=0.1175, pruned_loss=0.01793, audio_tagging_loss=0.01025, over 15346.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08881, pruned_loss=0.01228, audio_tagging_loss=0.009443, over 3059351.67 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:58:12,351 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2023-11-25 20:58:16,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.741e+01 8.855e+01 9.248e+01 9.928e+01 1.852e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-25 20:58:18,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3059153.3333333335, ans=0.1 2023-11-25 20:58:24,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-25 20:58:39,949 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458900 2023-11-25 20:58:40,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3059286.6666666665, ans=0.0 2023-11-25 20:58:46,026 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2000, loss[loss=0.0663, simple_loss=0.08848, pruned_loss=0.01162, audio_tagging_loss=0.01044, over 15862.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08873, pruned_loss=0.01237, audio_tagging_loss=0.009439, over 3051859.81 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:58:52,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3059353.3333333335, ans=0.125 2023-11-25 20:59:01,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3059420.0, ans=0.125 2023-11-25 20:59:14,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2023-11-25 20:59:18,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3059553.3333333335, ans=0.0 2023-11-25 20:59:35,130 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458950 2023-11-25 20:59:40,299 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2050, loss[loss=0.05416, simple_loss=0.06883, pruned_loss=0.007452, audio_tagging_loss=0.01229, over 14942.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08871, pruned_loss=0.01238, audio_tagging_loss=0.009416, over 3036624.36 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:59:45,957 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=22.5 2023-11-25 21:00:00,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3059753.3333333335, ans=0.125 2023-11-25 21:00:05,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3059820.0, ans=0.1 2023-11-25 21:00:08,231 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.675e+01 9.370e+01 9.808e+01 1.405e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-25 21:00:13,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3059886.6666666665, ans=0.125 2023-11-25 21:00:29,816 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459000 2023-11-25 21:00:30,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3059953.3333333335, ans=10.0 2023-11-25 21:00:35,276 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2100, loss[loss=0.06862, simple_loss=0.09127, pruned_loss=0.01192, audio_tagging_loss=0.01107, over 14598.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08843, pruned_loss=0.01229, audio_tagging_loss=0.009447, over 3040218.90 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:00:45,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3060086.6666666665, ans=0.2 2023-11-25 21:00:49,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2023-11-25 21:00:51,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2023-11-25 21:01:03,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3060153.3333333335, ans=0.125 2023-11-25 21:01:22,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3060286.6666666665, ans=0.1 2023-11-25 21:01:24,396 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459050 2023-11-25 21:01:30,068 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2150, loss[loss=0.0777, simple_loss=0.1052, pruned_loss=0.01464, audio_tagging_loss=0.01046, over 15589.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0886, pruned_loss=0.01248, audio_tagging_loss=0.009551, over 3044301.99 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:01:34,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3060353.3333333335, ans=0.0 2023-11-25 21:01:43,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3060420.0, ans=0.125 2023-11-25 21:01:48,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3060420.0, ans=0.0 2023-11-25 21:01:58,071 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.469e+01 9.111e+01 9.697e+01 1.371e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-25 21:02:03,361 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:02:03,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3060553.3333333335, ans=0.125 2023-11-25 21:02:15,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3060620.0, ans=0.05 2023-11-25 21:02:16,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3060620.0, ans=0.0 2023-11-25 21:02:19,662 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459100 2023-11-25 21:02:24,874 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2200, loss[loss=0.05787, simple_loss=0.06738, pruned_loss=0.0136, audio_tagging_loss=0.01058, over 14736.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08862, pruned_loss=0.01248, audio_tagging_loss=0.009523, over 3044073.64 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:02:33,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3060686.6666666665, ans=0.125 2023-11-25 21:02:38,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3060753.3333333335, ans=0.0 2023-11-25 21:03:13,480 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459150 2023-11-25 21:03:17,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3061020.0, ans=0.0 2023-11-25 21:03:18,634 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2250, loss[loss=0.06328, simple_loss=0.09202, pruned_loss=0.01017, audio_tagging_loss=0.007104, over 15152.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.0881, pruned_loss=0.01242, audio_tagging_loss=0.009566, over 3044276.53 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:03:19,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3061020.0, ans=0.125 2023-11-25 21:03:33,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3061086.6666666665, ans=0.1 2023-11-25 21:03:37,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-25 21:03:41,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=12.0 2023-11-25 21:03:47,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.734e+01 9.500e+01 1.033e+02 1.214e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-25 21:03:53,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3061220.0, ans=0.09899494936611666 2023-11-25 21:04:05,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3061286.6666666665, ans=0.125 2023-11-25 21:04:07,717 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459200 2023-11-25 21:04:10,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3061286.6666666665, ans=0.125 2023-11-25 21:04:13,873 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2300, loss[loss=0.09448, simple_loss=0.1206, pruned_loss=0.02465, audio_tagging_loss=0.009523, over 14070.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08824, pruned_loss=0.01245, audio_tagging_loss=0.009691, over 3035918.56 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:04:14,487 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-11-25 21:04:15,133 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:04:22,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3061353.3333333335, ans=0.2 2023-11-25 21:04:26,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.14 vs. limit=15.0 2023-11-25 21:04:37,709 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.97 vs. limit=22.5 2023-11-25 21:04:38,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-25 21:04:41,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.74 vs. limit=15.0 2023-11-25 21:04:41,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3061486.6666666665, ans=0.0 2023-11-25 21:04:46,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3061553.3333333335, ans=0.125 2023-11-25 21:04:52,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3061553.3333333335, ans=0.0 2023-11-25 21:05:03,119 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:05:03,164 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459250 2023-11-25 21:05:08,341 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2350, loss[loss=0.06049, simple_loss=0.08229, pruned_loss=0.0111, audio_tagging_loss=0.008237, over 14461.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08848, pruned_loss=0.01246, audio_tagging_loss=0.009654, over 3029670.61 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:05:11,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3061686.6666666665, ans=0.0 2023-11-25 21:05:14,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-25 21:05:23,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3061753.3333333335, ans=0.0 2023-11-25 21:05:36,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.670e+01 9.358e+01 1.017e+02 1.318e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-25 21:05:38,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3061820.0, ans=0.125 2023-11-25 21:05:54,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3061953.3333333335, ans=0.0 2023-11-25 21:05:57,233 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459300 2023-11-25 21:06:01,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-11-25 21:06:02,360 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2400, loss[loss=0.06901, simple_loss=0.1055, pruned_loss=0.008928, audio_tagging_loss=0.00733, over 14934.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.0896, pruned_loss=0.01245, audio_tagging_loss=0.009667, over 3037232.21 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:06:02,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3062020.0, ans=0.0 2023-11-25 21:06:06,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3062020.0, ans=0.0 2023-11-25 21:06:14,996 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-25 21:06:32,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3062153.3333333335, ans=0.1 2023-11-25 21:06:37,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3062220.0, ans=0.1 2023-11-25 21:06:50,892 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459350 2023-11-25 21:06:56,591 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2450, loss[loss=0.06238, simple_loss=0.08185, pruned_loss=0.01176, audio_tagging_loss=0.009693, over 14958.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.08987, pruned_loss=0.01251, audio_tagging_loss=0.009706, over 3043449.77 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:07:24,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.490e+01 9.348e+01 1.014e+02 1.568e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 21:07:27,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3062553.3333333335, ans=0.0 2023-11-25 21:07:35,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3062553.3333333335, ans=0.125 2023-11-25 21:07:40,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3062620.0, ans=0.0 2023-11-25 21:07:40,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3062620.0, ans=0.0 2023-11-25 21:07:45,868 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459400 2023-11-25 21:07:47,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3062620.0, ans=0.125 2023-11-25 21:07:51,342 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2500, loss[loss=0.05832, simple_loss=0.08842, pruned_loss=0.00783, audio_tagging_loss=0.006281, over 14072.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09008, pruned_loss=0.0125, audio_tagging_loss=0.009612, over 3043409.87 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:08:04,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3062753.3333333335, ans=0.1 2023-11-25 21:08:05,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3062753.3333333335, ans=0.09899494936611666 2023-11-25 21:08:13,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062820.0, ans=0.1 2023-11-25 21:08:15,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3062820.0, ans=0.125 2023-11-25 21:08:21,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3062820.0, ans=0.125 2023-11-25 21:08:39,820 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459450 2023-11-25 21:08:40,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2023-11-25 21:08:42,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3062953.3333333335, ans=0.2 2023-11-25 21:08:42,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3062953.3333333335, ans=0.125 2023-11-25 21:08:44,901 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2550, loss[loss=0.0716, simple_loss=0.1022, pruned_loss=0.009278, audio_tagging_loss=0.0112, over 16720.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09065, pruned_loss=0.01274, audio_tagging_loss=0.009527, over 3043949.33 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:09:06,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3063153.3333333335, ans=0.2 2023-11-25 21:09:13,328 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.639e+01 9.480e+01 1.019e+02 1.523e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-25 21:09:17,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3063220.0, ans=0.0 2023-11-25 21:09:17,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3063220.0, ans=0.2 2023-11-25 21:09:26,030 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:09:33,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459500 2023-11-25 21:09:38,363 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2600, loss[loss=0.06492, simple_loss=0.08715, pruned_loss=0.01207, audio_tagging_loss=0.009269, over 14625.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09024, pruned_loss=0.01265, audio_tagging_loss=0.009388, over 3036831.51 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:09:49,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-25 21:09:49,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-25 21:10:08,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-11-25 21:10:09,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3063486.6666666665, ans=0.125 2023-11-25 21:10:19,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=12.0 2023-11-25 21:10:20,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3063553.3333333335, ans=0.125 2023-11-25 21:10:21,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3063620.0, ans=0.0 2023-11-25 21:10:28,912 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459550 2023-11-25 21:10:34,009 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2650, loss[loss=0.08333, simple_loss=0.1177, pruned_loss=0.01622, audio_tagging_loss=0.008283, over 15504.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.08995, pruned_loss=0.01257, audio_tagging_loss=0.009309, over 3034515.74 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:10:48,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3063753.3333333335, ans=0.125 2023-11-25 21:10:49,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3063753.3333333335, ans=0.05 2023-11-25 21:10:59,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3063820.0, ans=0.125 2023-11-25 21:11:00,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3063820.0, ans=0.2 2023-11-25 21:11:01,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.401e+01 9.228e+01 9.795e+01 1.294e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-25 21:11:15,682 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-25 21:11:22,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459600 2023-11-25 21:11:27,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3064020.0, ans=0.0 2023-11-25 21:11:28,312 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2700, loss[loss=0.07922, simple_loss=0.1015, pruned_loss=0.02002, audio_tagging_loss=0.008454, over 13469.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.08978, pruned_loss=0.01271, audio_tagging_loss=0.009318, over 3039878.53 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:11:31,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3064020.0, ans=0.0 2023-11-25 21:11:39,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3064086.6666666665, ans=0.1 2023-11-25 21:12:01,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3064220.0, ans=0.95 2023-11-25 21:12:06,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3064220.0, ans=0.1 2023-11-25 21:12:16,705 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459650 2023-11-25 21:12:17,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3064286.6666666665, ans=0.0 2023-11-25 21:12:21,806 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2750, loss[loss=0.07598, simple_loss=0.1012, pruned_loss=0.01602, audio_tagging_loss=0.009342, over 14937.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.08933, pruned_loss=0.01273, audio_tagging_loss=0.009396, over 3040377.92 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:12:24,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-25 21:12:33,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3064420.0, ans=0.5 2023-11-25 21:12:40,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3064420.0, ans=0.0 2023-11-25 21:12:41,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3064420.0, ans=0.025 2023-11-25 21:12:41,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-25 21:12:43,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3064486.6666666665, ans=0.1 2023-11-25 21:12:49,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3064486.6666666665, ans=0.125 2023-11-25 21:12:50,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.675e+01 9.044e+01 9.844e+01 1.238e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-25 21:13:09,006 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:13:11,077 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459700 2023-11-25 21:13:17,189 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2800, loss[loss=0.06804, simple_loss=0.08684, pruned_loss=0.01522, audio_tagging_loss=0.00941, over 14450.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.08987, pruned_loss=0.01285, audio_tagging_loss=0.009246, over 3048575.09 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:13:23,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3064686.6666666665, ans=0.125 2023-11-25 21:13:33,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3064753.3333333335, ans=0.0 2023-11-25 21:13:44,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3064820.0, ans=0.0 2023-11-25 21:13:49,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-25 21:13:53,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-25 21:13:56,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-25 21:14:06,780 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459750 2023-11-25 21:14:11,860 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2850, loss[loss=0.07235, simple_loss=0.1067, pruned_loss=0.0123, audio_tagging_loss=0.006691, over 14702.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09029, pruned_loss=0.01295, audio_tagging_loss=0.009151, over 3040839.59 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:14:13,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3065020.0, ans=0.125 2023-11-25 21:14:30,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3065086.6666666665, ans=0.125 2023-11-25 21:14:32,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2023-11-25 21:14:39,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.595e+01 9.142e+01 9.903e+01 1.163e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-25 21:14:40,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3065153.3333333335, ans=0.2 2023-11-25 21:14:48,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3065220.0, ans=0.0 2023-11-25 21:14:57,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3065286.6666666665, ans=0.1 2023-11-25 21:14:58,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3065286.6666666665, ans=0.2 2023-11-25 21:15:00,282 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459800 2023-11-25 21:15:05,826 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2900, loss[loss=0.07596, simple_loss=0.099, pruned_loss=0.01327, audio_tagging_loss=0.0132, over 16236.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09034, pruned_loss=0.01293, audio_tagging_loss=0.009231, over 3040149.06 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:15:09,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3065353.3333333335, ans=0.125 2023-11-25 21:15:21,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3065420.0, ans=0.09899494936611666 2023-11-25 21:15:31,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3065486.6666666665, ans=0.2 2023-11-25 21:15:38,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3065553.3333333335, ans=0.125 2023-11-25 21:15:54,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459850 2023-11-25 21:16:00,335 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2950, loss[loss=0.07312, simple_loss=0.09889, pruned_loss=0.01693, audio_tagging_loss=0.006744, over 15467.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09061, pruned_loss=0.013, audio_tagging_loss=0.009297, over 3046590.08 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:16:12,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-25 21:16:16,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-25 21:16:27,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.766e+01 9.471e+01 1.038e+02 1.516e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-25 21:16:35,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3065886.6666666665, ans=0.125 2023-11-25 21:16:42,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-25 21:16:44,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-25 21:16:45,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-25 21:16:46,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-25 21:16:49,096 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459900 2023-11-25 21:16:54,789 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3000, loss[loss=0.06561, simple_loss=0.09218, pruned_loss=0.01132, audio_tagging_loss=0.008205, over 15322.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09163, pruned_loss=0.01306, audio_tagging_loss=0.009312, over 3049922.66 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:16:54,790 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-25 21:17:26,412 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.05939, simple_loss=0.05076, pruned_loss=0.005254, audio_tagging_loss=0.02875, over 4681554.00 frames. 2023-11-25 21:17:26,412 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-25 21:17:52,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3066153.3333333335, ans=0.125 2023-11-25 21:18:04,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3066220.0, ans=0.0 2023-11-25 21:18:16,061 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459950 2023-11-25 21:18:21,909 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3050, loss[loss=0.06837, simple_loss=0.09109, pruned_loss=0.01175, audio_tagging_loss=0.01107, over 15459.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09233, pruned_loss=0.01303, audio_tagging_loss=0.009196, over 3046668.76 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:18:29,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2023-11-25 21:18:33,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=22.5 2023-11-25 21:18:46,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3066486.6666666665, ans=0.1 2023-11-25 21:18:49,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.558e+01 9.280e+01 1.009e+02 1.298e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 21:18:53,084 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:18:54,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066553.3333333335, ans=0.1 2023-11-25 21:18:56,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066553.3333333335, ans=0.1 2023-11-25 21:19:11,070 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460000 2023-11-25 21:19:19,031 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3100, loss[loss=0.07378, simple_loss=0.1074, pruned_loss=0.01242, audio_tagging_loss=0.007677, over 14725.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09244, pruned_loss=0.01296, audio_tagging_loss=0.009202, over 3047448.39 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:19:21,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3066686.6666666665, ans=0.125 2023-11-25 21:20:07,918 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460050 2023-11-25 21:20:13,159 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3150, loss[loss=0.05445, simple_loss=0.07011, pruned_loss=0.007854, audio_tagging_loss=0.01154, over 15706.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09172, pruned_loss=0.01283, audio_tagging_loss=0.00936, over 3050166.34 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:20:23,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3067086.6666666665, ans=0.1 2023-11-25 21:20:42,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.732e+01 9.474e+01 1.004e+02 1.246e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-25 21:21:03,059 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460100 2023-11-25 21:21:09,289 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3200, loss[loss=0.07003, simple_loss=0.09504, pruned_loss=0.01408, audio_tagging_loss=0.00843, over 14790.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09161, pruned_loss=0.01275, audio_tagging_loss=0.009481, over 3049937.08 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:21:10,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3067353.3333333335, ans=0.2 2023-11-25 21:21:19,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3067420.0, ans=0.2 2023-11-25 21:21:20,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3067420.0, ans=0.125 2023-11-25 21:21:21,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-25 21:21:26,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3067420.0, ans=0.1 2023-11-25 21:21:27,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=22.5 2023-11-25 21:21:32,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3067486.6666666665, ans=0.125 2023-11-25 21:21:35,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=22.5 2023-11-25 21:21:53,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2023-11-25 21:21:58,633 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460150 2023-11-25 21:22:01,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-25 21:22:03,794 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3250, loss[loss=0.05253, simple_loss=0.06632, pruned_loss=0.007591, audio_tagging_loss=0.01178, over 15359.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09123, pruned_loss=0.01259, audio_tagging_loss=0.00945, over 3053324.43 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:22:06,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3067686.6666666665, ans=0.0 2023-11-25 21:22:11,715 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-25 21:22:22,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3067753.3333333335, ans=0.125 2023-11-25 21:22:32,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.616e+01 9.104e+01 1.013e+02 1.269e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-25 21:22:53,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460200 2023-11-25 21:22:56,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3067953.3333333335, ans=0.1 2023-11-25 21:22:59,146 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3300, loss[loss=0.04743, simple_loss=0.07026, pruned_loss=0.005972, audio_tagging_loss=0.00633, over 14982.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09163, pruned_loss=0.01266, audio_tagging_loss=0.009555, over 3051667.59 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:23:00,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=3068020.0, ans=0.1 2023-11-25 21:23:16,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.86 vs. limit=15.0 2023-11-25 21:23:17,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3068086.6666666665, ans=0.125 2023-11-25 21:23:24,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-25 21:23:33,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3068220.0, ans=0.0 2023-11-25 21:23:48,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460250 2023-11-25 21:23:54,387 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3350, loss[loss=0.04772, simple_loss=0.05071, pruned_loss=0.01048, audio_tagging_loss=0.01189, over 14780.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.09171, pruned_loss=0.01256, audio_tagging_loss=0.009475, over 3048335.88 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:23:58,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3068353.3333333335, ans=0.0 2023-11-25 21:24:22,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.673e+01 9.369e+01 1.012e+02 1.333e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-25 21:24:29,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3068553.3333333335, ans=0.0 2023-11-25 21:24:44,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2023-11-25 21:24:44,748 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460300 2023-11-25 21:24:49,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.80 vs. limit=6.0 2023-11-25 21:24:50,009 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3400, loss[loss=0.0574, simple_loss=0.07959, pruned_loss=0.0102, audio_tagging_loss=0.007398, over 15442.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09222, pruned_loss=0.01262, audio_tagging_loss=0.009287, over 3057998.24 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:24:58,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-25 21:24:59,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3068753.3333333335, ans=0.0 2023-11-25 21:25:11,601 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:25:11,786 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2023-11-25 21:25:15,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3068820.0, ans=0.2 2023-11-25 21:25:22,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3068886.6666666665, ans=0.0 2023-11-25 21:25:39,052 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460350 2023-11-25 21:25:44,244 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3450, loss[loss=0.07967, simple_loss=0.1085, pruned_loss=0.01848, audio_tagging_loss=0.006953, over 15301.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09178, pruned_loss=0.01269, audio_tagging_loss=0.009177, over 3057472.39 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:25:45,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3069020.0, ans=0.5 2023-11-25 21:25:45,790 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-25 21:25:51,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3069020.0, ans=0.025 2023-11-25 21:25:56,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3069086.6666666665, ans=0.125 2023-11-25 21:25:59,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3069086.6666666665, ans=0.125 2023-11-25 21:26:13,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.826e+01 9.469e+01 1.006e+02 1.325e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-25 21:26:18,194 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:26:22,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3069220.0, ans=0.1 2023-11-25 21:26:28,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3069286.6666666665, ans=0.0 2023-11-25 21:26:34,488 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460400 2023-11-25 21:26:40,240 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3500, loss[loss=0.05654, simple_loss=0.06775, pruned_loss=0.01194, audio_tagging_loss=0.01072, over 14723.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09142, pruned_loss=0.01261, audio_tagging_loss=0.009078, over 3050172.48 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:27:08,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-25 21:27:08,721 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:27:30,914 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460450 2023-11-25 21:27:33,730 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.90 vs. limit=5.0 2023-11-25 21:27:36,131 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3550, loss[loss=0.06372, simple_loss=0.08648, pruned_loss=0.01126, audio_tagging_loss=0.009214, over 15774.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09105, pruned_loss=0.01269, audio_tagging_loss=0.009092, over 3040943.94 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:27:41,766 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:27:56,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3069820.0, ans=0.125 2023-11-25 21:27:56,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3069820.0, ans=0.125 2023-11-25 21:28:03,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3069820.0, ans=0.0 2023-11-25 21:28:05,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.535e+01 9.230e+01 9.860e+01 1.398e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-25 21:28:09,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2023-11-25 21:28:13,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2023-11-25 21:28:17,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3069886.6666666665, ans=0.2 2023-11-25 21:28:23,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3069953.3333333335, ans=0.1 2023-11-25 21:28:24,818 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460500 2023-11-25 21:28:30,026 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3600, loss[loss=0.06601, simple_loss=0.09729, pruned_loss=0.01072, audio_tagging_loss=0.006642, over 14989.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09123, pruned_loss=0.01278, audio_tagging_loss=0.009074, over 3040580.95 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:28:56,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3070153.3333333335, ans=0.0 2023-11-25 21:29:06,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3070220.0, ans=0.0 2023-11-25 21:29:19,209 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460550 2023-11-25 21:29:24,888 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3650, loss[loss=0.04154, simple_loss=0.04989, pruned_loss=0.005013, audio_tagging_loss=0.01158, over 14634.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09098, pruned_loss=0.01286, audio_tagging_loss=0.009085, over 3041558.94 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:29:25,666 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-25 21:29:32,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3070353.3333333335, ans=0.125 2023-11-25 21:29:36,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-25 21:29:37,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-25 21:29:39,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-25 21:29:39,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3070420.0, ans=0.125 2023-11-25 21:29:49,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3070486.6666666665, ans=0.125 2023-11-25 21:29:54,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.628e+01 9.158e+01 1.002e+02 1.364e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-25 21:30:03,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3070553.3333333335, ans=0.125 2023-11-25 21:30:15,077 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460600 2023-11-25 21:30:16,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3070620.0, ans=0.0 2023-11-25 21:30:20,513 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3700, loss[loss=0.08037, simple_loss=0.1145, pruned_loss=0.01678, audio_tagging_loss=0.006355, over 15449.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09111, pruned_loss=0.01289, audio_tagging_loss=0.009055, over 3044413.52 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:30:34,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3070753.3333333335, ans=0.0 2023-11-25 21:30:39,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3070753.3333333335, ans=0.0 2023-11-25 21:30:41,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3070820.0, ans=0.125 2023-11-25 21:31:09,773 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460650 2023-11-25 21:31:12,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3070953.3333333335, ans=0.0 2023-11-25 21:31:14,932 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3750, loss[loss=0.06752, simple_loss=0.08602, pruned_loss=0.01239, audio_tagging_loss=0.01212, over 14528.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09233, pruned_loss=0.01321, audio_tagging_loss=0.00907, over 3048758.10 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:31:25,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3071086.6666666665, ans=0.0 2023-11-25 21:31:44,998 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2023-11-25 21:31:45,433 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.797e+01 9.429e+01 1.022e+02 1.345e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-25 21:31:47,782 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:31:53,787 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:31:54,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3071220.0, ans=0.0 2023-11-25 21:32:04,166 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460700 2023-11-25 21:32:09,373 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3800, loss[loss=0.08185, simple_loss=0.1089, pruned_loss=0.01875, audio_tagging_loss=0.008644, over 15769.00 frames. ], tot_loss[loss=0.06858, simple_loss=0.09233, pruned_loss=0.01328, audio_tagging_loss=0.009143, over 3049951.23 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:32:09,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3071353.3333333335, ans=0.05 2023-11-25 21:32:22,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.30 vs. limit=10.0 2023-11-25 21:32:43,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3071553.3333333335, ans=0.2 2023-11-25 21:32:53,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3071620.0, ans=0.0 2023-11-25 21:32:53,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3071620.0, ans=0.2 2023-11-25 21:32:59,706 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460750 2023-11-25 21:33:02,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3071620.0, ans=0.125 2023-11-25 21:33:05,938 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3850, loss[loss=0.07388, simple_loss=0.1023, pruned_loss=0.01083, audio_tagging_loss=0.0119, over 15732.00 frames. ], tot_loss[loss=0.06826, simple_loss=0.0917, pruned_loss=0.01308, audio_tagging_loss=0.009326, over 3052311.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:33:06,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2023-11-25 21:33:30,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3071820.0, ans=0.1 2023-11-25 21:33:33,527 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2023-11-25 21:33:33,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2023-11-25 21:33:34,152 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.515e+01 9.072e+01 9.640e+01 1.260e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-25 21:33:35,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3071820.0, ans=0.125 2023-11-25 21:33:38,046 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:33:55,464 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460800 2023-11-25 21:34:00,958 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3900, loss[loss=0.07792, simple_loss=0.1122, pruned_loss=0.01644, audio_tagging_loss=0.005384, over 15693.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09101, pruned_loss=0.01297, audio_tagging_loss=0.009376, over 3047300.19 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:34:28,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3072153.3333333335, ans=0.2 2023-11-25 21:34:35,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3072220.0, ans=0.0 2023-11-25 21:34:50,181 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460850 2023-11-25 21:34:52,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3072286.6666666665, ans=0.0 2023-11-25 21:34:52,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3072286.6666666665, ans=0.2 2023-11-25 21:34:55,275 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3950, loss[loss=0.07132, simple_loss=0.09011, pruned_loss=0.01387, audio_tagging_loss=0.01238, over 15097.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09015, pruned_loss=0.01271, audio_tagging_loss=0.00961, over 3039827.25 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:35:05,991 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:35:06,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=22.5 2023-11-25 21:35:08,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3072420.0, ans=0.0 2023-11-25 21:35:08,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-25 21:35:17,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3072486.6666666665, ans=0.1 2023-11-25 21:35:24,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2023-11-25 21:35:26,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-25 21:35:26,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.593e+01 9.164e+01 9.900e+01 1.243e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-25 21:35:34,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3072553.3333333335, ans=0.0 2023-11-25 21:35:45,370 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460900 2023-11-25 21:35:51,039 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4000, loss[loss=0.06533, simple_loss=0.09416, pruned_loss=0.009721, audio_tagging_loss=0.008527, over 15297.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09027, pruned_loss=0.01277, audio_tagging_loss=0.009599, over 3044762.99 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:35:58,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-25 21:36:34,668 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:36:40,658 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460950 2023-11-25 21:36:40,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3072953.3333333335, ans=0.0 2023-11-25 21:36:43,974 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:36:46,395 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4050, loss[loss=0.06296, simple_loss=0.07867, pruned_loss=0.01171, audio_tagging_loss=0.01192, over 15836.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.09151, pruned_loss=0.013, audio_tagging_loss=0.009496, over 3041237.09 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:36:50,645 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:37:01,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3073086.6666666665, ans=0.1 2023-11-25 21:37:04,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3073086.6666666665, ans=0.0 2023-11-25 21:37:15,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3073153.3333333335, ans=0.125 2023-11-25 21:37:16,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3073153.3333333335, ans=15.0 2023-11-25 21:37:19,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.880e+01 9.594e+01 1.042e+02 1.593e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-25 21:37:20,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=3073220.0, ans=12.0 2023-11-25 21:37:35,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461000 2023-11-25 21:37:41,381 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4100, loss[loss=0.06544, simple_loss=0.08844, pruned_loss=0.01334, audio_tagging_loss=0.007888, over 14253.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09123, pruned_loss=0.01294, audio_tagging_loss=0.009399, over 3036407.73 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:37:43,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-25 21:37:51,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3073420.0, ans=0.0 2023-11-25 21:37:56,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3073420.0, ans=0.125 2023-11-25 21:38:26,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3073620.0, ans=0.125 2023-11-25 21:38:30,709 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461050 2023-11-25 21:38:36,933 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4150, loss[loss=0.07369, simple_loss=0.1032, pruned_loss=0.01438, audio_tagging_loss=0.007707, over 15018.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09099, pruned_loss=0.01271, audio_tagging_loss=0.009252, over 3038449.76 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:38:37,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.32 vs. limit=10.0 2023-11-25 21:38:57,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3073820.0, ans=0.0 2023-11-25 21:39:00,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:09,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.760e+01 9.274e+01 9.766e+01 1.268e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-25 21:39:17,895 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:39:26,817 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461100 2023-11-25 21:39:32,019 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4200, loss[loss=0.06986, simple_loss=0.09185, pruned_loss=0.01442, audio_tagging_loss=0.009525, over 15500.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09063, pruned_loss=0.01284, audio_tagging_loss=0.009049, over 3040654.53 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:39:42,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3074086.6666666665, ans=0.0 2023-11-25 21:40:00,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3074153.3333333335, ans=0.0 2023-11-25 21:40:06,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3074220.0, ans=0.125 2023-11-25 21:40:08,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3074220.0, ans=0.2 2023-11-25 21:40:17,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2023-11-25 21:40:21,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461150 2023-11-25 21:40:21,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3074286.6666666665, ans=0.125 2023-11-25 21:40:23,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3074286.6666666665, ans=0.5 2023-11-25 21:40:26,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3074353.3333333335, ans=0.0 2023-11-25 21:40:27,313 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4250, loss[loss=0.06531, simple_loss=0.08699, pruned_loss=0.01294, audio_tagging_loss=0.008872, over 14665.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08968, pruned_loss=0.01256, audio_tagging_loss=0.009088, over 3037950.39 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:40:31,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074353.3333333335, ans=0.1 2023-11-25 21:40:53,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-25 21:40:55,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3074486.6666666665, ans=0.125 2023-11-25 21:40:58,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3074486.6666666665, ans=0.125 2023-11-25 21:41:00,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.649e+01 9.520e+01 1.007e+02 1.325e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-25 21:41:12,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3074620.0, ans=0.0 2023-11-25 21:41:16,478 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461200 2023-11-25 21:41:17,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3074620.0, ans=0.125 2023-11-25 21:41:22,523 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4300, loss[loss=0.05408, simple_loss=0.07274, pruned_loss=0.009919, audio_tagging_loss=0.007794, over 14658.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0898, pruned_loss=0.01262, audio_tagging_loss=0.009049, over 3036584.45 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:41:24,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3074686.6666666665, ans=0.125 2023-11-25 21:41:31,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3074686.6666666665, ans=0.0 2023-11-25 21:41:38,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3074753.3333333335, ans=0.2 2023-11-25 21:41:40,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3074753.3333333335, ans=0.2 2023-11-25 21:41:48,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2023-11-25 21:41:48,416 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-25 21:41:58,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3074886.6666666665, ans=0.025 2023-11-25 21:42:13,022 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461250 2023-11-25 21:42:13,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3074953.3333333335, ans=0.0 2023-11-25 21:42:15,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3074953.3333333335, ans=0.2 2023-11-25 21:42:17,432 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:42:18,143 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4350, loss[loss=0.06431, simple_loss=0.07757, pruned_loss=0.01379, audio_tagging_loss=0.01173, over 14767.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.0907, pruned_loss=0.0127, audio_tagging_loss=0.009045, over 3036644.95 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:42:23,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2023-11-25 21:42:25,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3075020.0, ans=0.125 2023-11-25 21:42:51,203 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.819e+01 9.341e+01 1.009e+02 3.956e+02, threshold=1.868e+02, percent-clipped=1.0 2023-11-25 21:42:52,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3075220.0, ans=0.0 2023-11-25 21:42:56,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3075220.0, ans=0.1 2023-11-25 21:42:58,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3075220.0, ans=0.125 2023-11-25 21:43:07,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461300 2023-11-25 21:43:12,647 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4400, loss[loss=0.07956, simple_loss=0.1107, pruned_loss=0.01441, audio_tagging_loss=0.009784, over 15046.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09142, pruned_loss=0.01279, audio_tagging_loss=0.009089, over 3031348.14 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:43:20,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3075353.3333333335, ans=0.0 2023-11-25 21:43:21,138 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-25 21:43:21,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3075353.3333333335, ans=0.125 2023-11-25 21:43:43,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-25 21:43:46,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3075553.3333333335, ans=0.09899494936611666 2023-11-25 21:43:54,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3075553.3333333335, ans=0.0 2023-11-25 21:43:57,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3075620.0, ans=0.125 2023-11-25 21:44:02,324 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461350 2023-11-25 21:44:07,976 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4450, loss[loss=0.05199, simple_loss=0.06015, pruned_loss=0.01035, audio_tagging_loss=0.01156, over 15512.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09172, pruned_loss=0.01276, audio_tagging_loss=0.008901, over 3036140.19 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:44:25,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3075753.3333333335, ans=0.2 2023-11-25 21:44:41,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.665e+01 9.390e+01 1.023e+02 1.193e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-25 21:44:52,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.83 vs. limit=10.0 2023-11-25 21:44:54,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3075953.3333333335, ans=0.125 2023-11-25 21:44:57,539 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461400 2023-11-25 21:45:03,434 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4500, loss[loss=0.05867, simple_loss=0.06781, pruned_loss=0.01059, audio_tagging_loss=0.01417, over 14662.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09169, pruned_loss=0.0128, audio_tagging_loss=0.008928, over 3033776.79 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:45:04,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.13 vs. limit=10.0 2023-11-25 21:45:06,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3076020.0, ans=0.125 2023-11-25 21:45:16,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2023-11-25 21:45:34,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3076220.0, ans=0.015 2023-11-25 21:45:49,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3076286.6666666665, ans=0.0 2023-11-25 21:45:52,281 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461450 2023-11-25 21:45:57,458 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4550, loss[loss=0.07956, simple_loss=0.1154, pruned_loss=0.0144, audio_tagging_loss=0.007477, over 15565.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.0916, pruned_loss=0.01283, audio_tagging_loss=0.009024, over 3040458.72 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:46:31,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.489e+01 8.972e+01 9.819e+01 1.195e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-25 21:46:32,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3076553.3333333335, ans=0.09899494936611666 2023-11-25 21:46:40,045 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:46:46,251 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461500 2023-11-25 21:46:46,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3076620.0, ans=0.0 2023-11-25 21:46:51,813 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4600, loss[loss=0.05052, simple_loss=0.07039, pruned_loss=0.006037, audio_tagging_loss=0.009292, over 15045.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09168, pruned_loss=0.01282, audio_tagging_loss=0.009117, over 3040518.15 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:46:59,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3076686.6666666665, ans=0.125 2023-11-25 21:46:59,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-25 21:47:03,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2023-11-25 21:47:30,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-25 21:47:41,685 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461550 2023-11-25 21:47:44,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3076953.3333333335, ans=0.125 2023-11-25 21:47:47,364 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4650, loss[loss=0.06667, simple_loss=0.08543, pruned_loss=0.01418, audio_tagging_loss=0.00978, over 14775.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09072, pruned_loss=0.01284, audio_tagging_loss=0.009227, over 3035768.48 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:47:53,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3077020.0, ans=0.07 2023-11-25 21:48:12,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2023-11-25 21:48:12,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3077153.3333333335, ans=22.5 2023-11-25 21:48:20,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.563e+01 9.172e+01 1.006e+02 1.160e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-25 21:48:25,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-11-25 21:48:36,252 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461600 2023-11-25 21:48:41,721 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4700, loss[loss=0.0642, simple_loss=0.08768, pruned_loss=0.01155, audio_tagging_loss=0.008811, over 15462.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.0897, pruned_loss=0.01265, audio_tagging_loss=0.009316, over 3042463.15 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:49:08,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3077486.6666666665, ans=0.125 2023-11-25 21:49:18,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3077553.3333333335, ans=0.125 2023-11-25 21:49:29,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461650 2023-11-25 21:49:33,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3077620.0, ans=0.125 2023-11-25 21:49:34,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3077686.6666666665, ans=0.1 2023-11-25 21:49:35,059 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4750, loss[loss=0.07098, simple_loss=0.09349, pruned_loss=0.01442, audio_tagging_loss=0.009808, over 14596.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08884, pruned_loss=0.0125, audio_tagging_loss=0.00936, over 3037345.31 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:49:43,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2023-11-25 21:50:04,728 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-25 21:50:07,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3077886.6666666665, ans=0.2 2023-11-25 21:50:09,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.924e+01 9.307e+01 1.025e+02 1.203e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-25 21:50:13,681 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:50:16,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3077886.6666666665, ans=0.125 2023-11-25 21:50:17,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3077953.3333333335, ans=0.0 2023-11-25 21:50:19,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3077953.3333333335, ans=0.125 2023-11-25 21:50:25,011 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461700 2023-11-25 21:50:30,571 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4800, loss[loss=0.05717, simple_loss=0.07003, pruned_loss=0.01061, audio_tagging_loss=0.01155, over 14385.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.08929, pruned_loss=0.0125, audio_tagging_loss=0.009476, over 3041423.11 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:50:30,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3078020.0, ans=0.1 2023-11-25 21:50:37,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3078020.0, ans=0.125 2023-11-25 21:50:41,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3078086.6666666665, ans=0.125 2023-11-25 21:51:02,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3078220.0, ans=0.015 2023-11-25 21:51:19,472 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461750 2023-11-25 21:51:24,560 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4850, loss[loss=0.05408, simple_loss=0.06647, pruned_loss=0.01049, audio_tagging_loss=0.01036, over 16064.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.08982, pruned_loss=0.0127, audio_tagging_loss=0.009516, over 3044087.32 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:51:26,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3078353.3333333335, ans=0.0 2023-11-25 21:51:37,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-11-25 21:51:45,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3078486.6666666665, ans=10.0 2023-11-25 21:51:54,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3078486.6666666665, ans=0.125 2023-11-25 21:51:57,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3078553.3333333335, ans=0.125 2023-11-25 21:51:58,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.740e+01 9.474e+01 1.031e+02 1.193e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-25 21:52:12,710 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461800 2023-11-25 21:52:12,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3078620.0, ans=0.1 2023-11-25 21:52:17,357 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:52:18,133 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4900, loss[loss=0.07959, simple_loss=0.1064, pruned_loss=0.01384, audio_tagging_loss=0.01257, over 14600.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.08966, pruned_loss=0.01259, audio_tagging_loss=0.009407, over 3043430.54 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:52:24,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3078686.6666666665, ans=0.125 2023-11-25 21:52:33,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=22.5 2023-11-25 21:52:49,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3078820.0, ans=0.5 2023-11-25 21:52:52,644 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2023-11-25 21:53:07,338 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461850 2023-11-25 21:53:12,988 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4950, loss[loss=0.06629, simple_loss=0.09759, pruned_loss=0.01075, audio_tagging_loss=0.006749, over 15755.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.0892, pruned_loss=0.01243, audio_tagging_loss=0.009252, over 3044920.79 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:53:17,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3079020.0, ans=0.0 2023-11-25 21:53:20,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-25 21:53:34,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.80 vs. limit=10.0 2023-11-25 21:53:46,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.693e+01 9.293e+01 9.943e+01 1.246e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 21:53:55,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3079286.6666666665, ans=0.125 2023-11-25 21:54:02,747 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461900 2023-11-25 21:54:02,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3079286.6666666665, ans=0.1 2023-11-25 21:54:07,910 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5000, loss[loss=0.0672, simple_loss=0.09101, pruned_loss=0.0139, audio_tagging_loss=0.007789, over 16140.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08946, pruned_loss=0.01259, audio_tagging_loss=0.009063, over 3037389.40 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:54:17,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3079420.0, ans=0.125 2023-11-25 21:54:20,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3079420.0, ans=0.0 2023-11-25 21:54:52,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3079620.0, ans=0.125 2023-11-25 21:54:53,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3079620.0, ans=0.2 2023-11-25 21:54:55,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=10.0 2023-11-25 21:54:56,529 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461950 2023-11-25 21:54:56,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3079620.0, ans=0.0 2023-11-25 21:55:01,803 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5050, loss[loss=0.06128, simple_loss=0.08474, pruned_loss=0.01166, audio_tagging_loss=0.007256, over 15246.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09022, pruned_loss=0.01272, audio_tagging_loss=0.00902, over 3045224.36 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:55:07,443 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-25 21:55:35,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.539e+01 9.066e+01 9.676e+01 1.144e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-25 21:55:36,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3079886.6666666665, ans=0.07 2023-11-25 21:55:37,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3079886.6666666665, ans=0.1 2023-11-25 21:55:39,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3079886.6666666665, ans=0.1 2023-11-25 21:55:45,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3079953.3333333335, ans=0.1 2023-11-25 21:55:50,299 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462000 2023-11-25 21:55:56,005 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5100, loss[loss=0.04474, simple_loss=0.05535, pruned_loss=0.00753, audio_tagging_loss=0.009533, over 14312.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.0901, pruned_loss=0.01272, audio_tagging_loss=0.008975, over 3049394.28 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:56:06,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-11-25 21:56:14,315 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:56:41,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3080286.6666666665, ans=0.0 2023-11-25 21:56:45,791 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462050 2023-11-25 21:56:51,454 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5150, loss[loss=0.06448, simple_loss=0.07648, pruned_loss=0.01592, audio_tagging_loss=0.01032, over 16064.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.08994, pruned_loss=0.01273, audio_tagging_loss=0.009024, over 3046410.38 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:57:25,136 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.763e+01 9.349e+01 9.902e+01 1.210e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 21:57:31,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3080553.3333333335, ans=0.125 2023-11-25 21:57:40,157 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462100 2023-11-25 21:57:45,284 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5200, loss[loss=0.06002, simple_loss=0.08966, pruned_loss=0.00804, audio_tagging_loss=0.007151, over 15006.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08863, pruned_loss=0.01245, audio_tagging_loss=0.008971, over 3050364.15 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:57:51,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3080686.6666666665, ans=0.125 2023-11-25 21:57:55,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.56 vs. limit=10.0 2023-11-25 21:57:57,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3080753.3333333335, ans=0.125 2023-11-25 21:57:59,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3080753.3333333335, ans=0.5 2023-11-25 21:58:02,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3080753.3333333335, ans=0.0 2023-11-25 21:58:18,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3080886.6666666665, ans=0.2 2023-11-25 21:58:33,949 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462150 2023-11-25 21:58:39,641 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5250, loss[loss=0.07221, simple_loss=0.08939, pruned_loss=0.01783, audio_tagging_loss=0.009687, over 14402.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08942, pruned_loss=0.01259, audio_tagging_loss=0.008948, over 3041717.11 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:58:52,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3081086.6666666665, ans=0.0 2023-11-25 21:59:14,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.546e+01 9.251e+01 9.912e+01 1.159e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-25 21:59:19,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.28 vs. limit=8.0 2023-11-25 21:59:29,354 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462200 2023-11-25 21:59:33,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=8.0 2023-11-25 21:59:34,808 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5300, loss[loss=0.08663, simple_loss=0.1174, pruned_loss=0.0187, audio_tagging_loss=0.009243, over 14316.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08973, pruned_loss=0.01268, audio_tagging_loss=0.008955, over 3032885.27 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:59:37,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3081353.3333333335, ans=0.125 2023-11-25 21:59:49,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-25 21:59:57,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3081486.6666666665, ans=0.125 2023-11-25 22:00:06,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3081553.3333333335, ans=0.1 2023-11-25 22:00:20,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3081620.0, ans=0.07 2023-11-25 22:00:23,664 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462250 2023-11-25 22:00:29,309 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5350, loss[loss=0.04843, simple_loss=0.05571, pruned_loss=0.007594, audio_tagging_loss=0.01298, over 15550.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09006, pruned_loss=0.01289, audio_tagging_loss=0.00899, over 3025002.24 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:00:35,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3081686.6666666665, ans=0.125 2023-11-25 22:00:36,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-25 22:00:42,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3081753.3333333335, ans=0.1 2023-11-25 22:00:51,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3081820.0, ans=0.1 2023-11-25 22:01:02,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3081886.6666666665, ans=0.2 2023-11-25 22:01:02,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-25 22:01:03,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3081886.6666666665, ans=0.07 2023-11-25 22:01:04,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.802e+01 9.245e+01 1.006e+02 1.324e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 22:01:15,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3081953.3333333335, ans=0.125 2023-11-25 22:01:18,126 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462300 2023-11-25 22:01:23,233 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5400, loss[loss=0.06137, simple_loss=0.08458, pruned_loss=0.009428, audio_tagging_loss=0.009645, over 15976.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09016, pruned_loss=0.0129, audio_tagging_loss=0.009083, over 3036907.01 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:01:30,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3082020.0, ans=0.0 2023-11-25 22:01:43,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3082086.6666666665, ans=0.0 2023-11-25 22:01:50,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3082153.3333333335, ans=0.125 2023-11-25 22:02:13,202 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462350 2023-11-25 22:02:14,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3082286.6666666665, ans=0.09899494936611666 2023-11-25 22:02:15,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3082286.6666666665, ans=0.125 2023-11-25 22:02:18,839 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5450, loss[loss=0.06352, simple_loss=0.07552, pruned_loss=0.01396, audio_tagging_loss=0.01179, over 15485.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09057, pruned_loss=0.01291, audio_tagging_loss=0.00912, over 3040082.33 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:02:25,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3082353.3333333335, ans=0.125 2023-11-25 22:02:40,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3082486.6666666665, ans=0.125 2023-11-25 22:02:53,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.739e+01 9.442e+01 1.018e+02 1.459e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 22:03:01,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3082620.0, ans=0.1 2023-11-25 22:03:04,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3082620.0, ans=0.0 2023-11-25 22:03:07,606 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462400 2023-11-25 22:03:12,950 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5500, loss[loss=0.05683, simple_loss=0.07662, pruned_loss=0.008495, audio_tagging_loss=0.01002, over 13874.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09058, pruned_loss=0.01273, audio_tagging_loss=0.009161, over 3045124.92 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:03:28,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3082753.3333333335, ans=0.1 2023-11-25 22:03:40,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3082820.0, ans=0.125 2023-11-25 22:03:40,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-25 22:04:02,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462450 2023-11-25 22:04:02,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3082953.3333333335, ans=0.0 2023-11-25 22:04:03,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3082953.3333333335, ans=0.125 2023-11-25 22:04:07,411 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5550, loss[loss=0.06657, simple_loss=0.09037, pruned_loss=0.01039, audio_tagging_loss=0.011, over 16613.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09072, pruned_loss=0.0127, audio_tagging_loss=0.009266, over 3052223.13 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:04:37,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3083153.3333333335, ans=0.125 2023-11-25 22:04:42,727 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.648e+01 9.293e+01 9.970e+01 1.288e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 22:04:56,918 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462500 2023-11-25 22:05:03,009 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5600, loss[loss=0.07025, simple_loss=0.08289, pruned_loss=0.0151, audio_tagging_loss=0.01371, over 15153.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09029, pruned_loss=0.01254, audio_tagging_loss=0.009403, over 3060265.51 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:05:23,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3083486.6666666665, ans=0.125 2023-11-25 22:05:27,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3083486.6666666665, ans=0.1 2023-11-25 22:05:27,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-25 22:05:41,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3083553.3333333335, ans=0.05 2023-11-25 22:05:43,049 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:05:50,369 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-25 22:05:51,813 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462550 2023-11-25 22:05:56,944 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5650, loss[loss=0.0584, simple_loss=0.08412, pruned_loss=0.00969, audio_tagging_loss=0.006644, over 14835.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.08979, pruned_loss=0.01243, audio_tagging_loss=0.009424, over 3057501.52 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:05:58,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3083686.6666666665, ans=0.0 2023-11-25 22:06:09,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3083753.3333333335, ans=0.125 2023-11-25 22:06:30,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3083886.6666666665, ans=0.0 2023-11-25 22:06:32,134 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.554e+01 9.201e+01 9.858e+01 1.570e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-25 22:06:45,795 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462600 2023-11-25 22:06:47,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-11-25 22:06:51,774 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5700, loss[loss=0.05517, simple_loss=0.06159, pruned_loss=0.01149, audio_tagging_loss=0.01289, over 15894.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.08998, pruned_loss=0.01256, audio_tagging_loss=0.009402, over 3059063.05 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:07:06,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3084086.6666666665, ans=0.125 2023-11-25 22:07:18,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3084153.3333333335, ans=0.0 2023-11-25 22:07:19,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3084153.3333333335, ans=0.0 2023-11-25 22:07:20,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-25 22:07:20,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2023-11-25 22:07:41,330 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462650 2023-11-25 22:07:41,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3084286.6666666665, ans=0.0 2023-11-25 22:07:44,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=12.0 2023-11-25 22:07:46,929 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5750, loss[loss=0.06025, simple_loss=0.07628, pruned_loss=0.01169, audio_tagging_loss=0.01042, over 14739.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09015, pruned_loss=0.01258, audio_tagging_loss=0.009275, over 3056382.79 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:07:48,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3084353.3333333335, ans=0.125 2023-11-25 22:07:50,175 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-25 22:08:07,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-11-25 22:08:21,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.632e+01 9.114e+01 9.911e+01 1.968e+02, threshold=1.823e+02, percent-clipped=1.0 2023-11-25 22:08:35,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2023-11-25 22:08:36,375 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462700 2023-11-25 22:08:41,436 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5800, loss[loss=0.05215, simple_loss=0.07649, pruned_loss=0.006595, audio_tagging_loss=0.007316, over 16577.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09, pruned_loss=0.01249, audio_tagging_loss=0.009171, over 3057045.92 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:08:53,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3084753.3333333335, ans=0.0 2023-11-25 22:08:53,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3084753.3333333335, ans=0.125 2023-11-25 22:08:54,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3084753.3333333335, ans=0.0 2023-11-25 22:09:11,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2023-11-25 22:09:11,660 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-25 22:09:13,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3084886.6666666665, ans=0.125 2023-11-25 22:09:30,145 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462750 2023-11-25 22:09:35,303 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5850, loss[loss=0.0495, simple_loss=0.05809, pruned_loss=0.008346, audio_tagging_loss=0.01211, over 13493.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08992, pruned_loss=0.0126, audio_tagging_loss=0.00913, over 3048943.45 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:09:36,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3085020.0, ans=0.125 2023-11-25 22:10:03,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085153.3333333335, ans=0.1 2023-11-25 22:10:12,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.550e+01 9.214e+01 9.901e+01 1.645e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-25 22:10:16,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3085220.0, ans=0.125 2023-11-25 22:10:21,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3085286.6666666665, ans=0.125 2023-11-25 22:10:24,277 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462800 2023-11-25 22:10:27,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3085286.6666666665, ans=0.0 2023-11-25 22:10:30,184 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5900, loss[loss=0.0803, simple_loss=0.112, pruned_loss=0.01632, audio_tagging_loss=0.007973, over 15715.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09018, pruned_loss=0.01253, audio_tagging_loss=0.009074, over 3046939.79 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:10:46,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3085420.0, ans=0.0 2023-11-25 22:11:19,915 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462850 2023-11-25 22:11:24,927 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5950, loss[loss=0.08125, simple_loss=0.1061, pruned_loss=0.01722, audio_tagging_loss=0.011, over 14854.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09124, pruned_loss=0.01279, audio_tagging_loss=0.009005, over 3057679.53 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:11:29,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3085686.6666666665, ans=0.0 2023-11-25 22:11:35,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.51 vs. limit=10.0 2023-11-25 22:11:38,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3085753.3333333335, ans=0.125 2023-11-25 22:11:47,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085820.0, ans=0.1 2023-11-25 22:12:02,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.667e+01 9.136e+01 9.803e+01 1.331e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-25 22:12:14,025 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462900 2023-11-25 22:12:16,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3085953.3333333335, ans=0.0 2023-11-25 22:12:19,207 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6000, loss[loss=0.06952, simple_loss=0.09272, pruned_loss=0.01238, audio_tagging_loss=0.01078, over 15093.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09085, pruned_loss=0.01265, audio_tagging_loss=0.009174, over 3057180.90 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:12:19,208 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-25 22:12:29,879 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.9414, 5.4095, 5.8019, 5.1508], device='cuda:2') 2023-11-25 22:12:49,486 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.5630, 3.1479, 2.8243, 3.1991], device='cuda:2') 2023-11-25 22:12:50,937 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.05816, simple_loss=0.05073, pruned_loss=0.00518, audio_tagging_loss=0.02762, over 4681554.00 frames. 2023-11-25 22:12:50,938 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-25 22:13:06,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3086086.6666666665, ans=0.0 2023-11-25 22:13:16,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=12.0 2023-11-25 22:13:31,829 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:13:40,639 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462950 2023-11-25 22:13:40,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3086286.6666666665, ans=0.2 2023-11-25 22:13:45,809 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6050, loss[loss=0.05106, simple_loss=0.06326, pruned_loss=0.009318, audio_tagging_loss=0.01011, over 15248.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09066, pruned_loss=0.01256, audio_tagging_loss=0.009146, over 3060936.99 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:13:53,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-11-25 22:13:58,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3086420.0, ans=0.0 2023-11-25 22:14:06,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3086486.6666666665, ans=0.125 2023-11-25 22:14:15,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3086486.6666666665, ans=0.0 2023-11-25 22:14:23,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.707e+01 9.356e+01 1.011e+02 1.518e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 22:14:27,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086553.3333333335, ans=0.1 2023-11-25 22:14:34,224 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:14:35,133 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463000 2023-11-25 22:14:35,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3086620.0, ans=0.2 2023-11-25 22:14:40,563 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6100, loss[loss=0.08109, simple_loss=0.1182, pruned_loss=0.01601, audio_tagging_loss=0.006011, over 15469.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08978, pruned_loss=0.01261, audio_tagging_loss=0.009135, over 3059001.82 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:14:43,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3086686.6666666665, ans=0.2 2023-11-25 22:14:46,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-25 22:14:47,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3086686.6666666665, ans=0.0 2023-11-25 22:14:48,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3086686.6666666665, ans=0.0 2023-11-25 22:14:51,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3086753.3333333335, ans=0.125 2023-11-25 22:14:54,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3086753.3333333335, ans=0.0 2023-11-25 22:14:59,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3086753.3333333335, ans=0.2 2023-11-25 22:15:27,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3086953.3333333335, ans=0.2 2023-11-25 22:15:29,930 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463050 2023-11-25 22:15:36,157 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6150, loss[loss=0.05469, simple_loss=0.07157, pruned_loss=0.01013, audio_tagging_loss=0.00878, over 15141.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08947, pruned_loss=0.01261, audio_tagging_loss=0.009216, over 3043083.16 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:15:38,628 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-25 22:15:38,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2023-11-25 22:15:41,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3087020.0, ans=0.0 2023-11-25 22:15:45,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3087020.0, ans=0.125 2023-11-25 22:15:53,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3087086.6666666665, ans=0.5 2023-11-25 22:16:02,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3087153.3333333335, ans=0.0 2023-11-25 22:16:10,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3087220.0, ans=0.2 2023-11-25 22:16:14,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.733e+01 9.242e+01 9.873e+01 1.239e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-25 22:16:15,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087220.0, ans=0.1 2023-11-25 22:16:25,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3087286.6666666665, ans=0.125 2023-11-25 22:16:26,392 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463100 2023-11-25 22:16:31,581 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6200, loss[loss=0.07327, simple_loss=0.0987, pruned_loss=0.01595, audio_tagging_loss=0.007968, over 14758.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09041, pruned_loss=0.01266, audio_tagging_loss=0.009117, over 3049196.62 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:16:32,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2023-11-25 22:16:34,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3087353.3333333335, ans=0.125 2023-11-25 22:16:43,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3087420.0, ans=0.125 2023-11-25 22:16:52,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3087486.6666666665, ans=0.125 2023-11-25 22:17:03,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087553.3333333335, ans=0.1 2023-11-25 22:17:06,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3087553.3333333335, ans=0.0 2023-11-25 22:17:06,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2023-11-25 22:17:19,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3087620.0, ans=0.125 2023-11-25 22:17:20,711 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463150 2023-11-25 22:17:25,791 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6250, loss[loss=0.05432, simple_loss=0.07307, pruned_loss=0.007727, audio_tagging_loss=0.01006, over 16007.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09015, pruned_loss=0.01268, audio_tagging_loss=0.009137, over 3050087.84 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:17:39,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3087753.3333333335, ans=0.125 2023-11-25 22:17:53,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3087820.0, ans=0.125 2023-11-25 22:17:53,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3087820.0, ans=0.0 2023-11-25 22:17:54,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3087820.0, ans=0.125 2023-11-25 22:17:57,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2023-11-25 22:18:04,822 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.687e+01 9.120e+01 9.665e+01 2.497e+02, threshold=1.824e+02, percent-clipped=1.0 2023-11-25 22:18:07,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3087886.6666666665, ans=0.125 2023-11-25 22:18:11,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2023-11-25 22:18:15,287 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463200 2023-11-25 22:18:16,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3087953.3333333335, ans=0.05 2023-11-25 22:18:16,718 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2023-11-25 22:18:21,352 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6300, loss[loss=0.08711, simple_loss=0.1134, pruned_loss=0.02008, audio_tagging_loss=0.01031, over 15229.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09144, pruned_loss=0.01288, audio_tagging_loss=0.009192, over 3054828.63 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:18:29,872 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:18:34,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3088086.6666666665, ans=0.125 2023-11-25 22:18:43,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3088153.3333333335, ans=0.125 2023-11-25 22:18:59,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3088220.0, ans=0.125 2023-11-25 22:19:06,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3088286.6666666665, ans=0.125 2023-11-25 22:19:11,751 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463250 2023-11-25 22:19:16,997 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6350, loss[loss=0.05868, simple_loss=0.0691, pruned_loss=0.01276, audio_tagging_loss=0.01138, over 15751.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09094, pruned_loss=0.01286, audio_tagging_loss=0.009152, over 3050780.49 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:19:22,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3088353.3333333335, ans=0.2 2023-11-25 22:19:32,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3088420.0, ans=0.0 2023-11-25 22:19:32,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3088420.0, ans=0.2 2023-11-25 22:19:51,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3088553.3333333335, ans=0.2 2023-11-25 22:19:56,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.465e+01 9.257e+01 9.775e+01 1.191e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-25 22:20:07,200 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463300 2023-11-25 22:20:11,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-11-25 22:20:12,514 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6400, loss[loss=0.03915, simple_loss=0.04371, pruned_loss=0.004224, audio_tagging_loss=0.01307, over 16104.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09056, pruned_loss=0.01273, audio_tagging_loss=0.009308, over 3048272.03 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:20:20,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3088686.6666666665, ans=0.125 2023-11-25 22:20:23,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-25 22:20:27,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2023-11-25 22:20:58,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3088953.3333333335, ans=0.0 2023-11-25 22:20:58,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3088953.3333333335, ans=0.09899494936611666 2023-11-25 22:21:01,532 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463350 2023-11-25 22:21:01,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3088953.3333333335, ans=0.0 2023-11-25 22:21:04,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3088953.3333333335, ans=0.1 2023-11-25 22:21:06,722 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6450, loss[loss=0.06204, simple_loss=0.08252, pruned_loss=0.0117, audio_tagging_loss=0.009078, over 15333.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09033, pruned_loss=0.01266, audio_tagging_loss=0.009386, over 3042022.64 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:21:17,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-25 22:21:27,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3089086.6666666665, ans=0.0 2023-11-25 22:21:37,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3089153.3333333335, ans=0.125 2023-11-25 22:21:45,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.513e+01 9.181e+01 1.004e+02 1.135e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-25 22:21:50,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=22.5 2023-11-25 22:21:56,999 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463400 2023-11-25 22:21:58,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3089286.6666666665, ans=0.125 2023-11-25 22:22:03,075 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6500, loss[loss=0.06843, simple_loss=0.0959, pruned_loss=0.01215, audio_tagging_loss=0.008334, over 14982.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.08933, pruned_loss=0.01258, audio_tagging_loss=0.009438, over 3041623.05 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:22:03,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.50 vs. limit=22.5 2023-11-25 22:22:05,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3089353.3333333335, ans=0.0 2023-11-25 22:22:15,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3089420.0, ans=0.07 2023-11-25 22:22:24,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2023-11-25 22:22:25,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3089486.6666666665, ans=0.125 2023-11-25 22:22:29,244 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2023-11-25 22:22:36,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3089553.3333333335, ans=0.0 2023-11-25 22:22:43,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3089553.3333333335, ans=0.95 2023-11-25 22:22:49,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3089620.0, ans=0.0 2023-11-25 22:22:51,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3089620.0, ans=0.0 2023-11-25 22:22:52,360 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463450 2023-11-25 22:22:54,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3089620.0, ans=0.0 2023-11-25 22:22:58,093 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6550, loss[loss=0.06193, simple_loss=0.08442, pruned_loss=0.008887, audio_tagging_loss=0.01083, over 16196.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08956, pruned_loss=0.01256, audio_tagging_loss=0.009242, over 3046395.65 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:23:01,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3089686.6666666665, ans=15.0 2023-11-25 22:23:04,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2023-11-25 22:23:05,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3089686.6666666665, ans=0.07 2023-11-25 22:23:30,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3089886.6666666665, ans=0.0 2023-11-25 22:23:36,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.654e+01 9.097e+01 9.635e+01 1.212e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-25 22:23:47,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463500 2023-11-25 22:23:52,785 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6600, loss[loss=0.06033, simple_loss=0.08049, pruned_loss=0.01106, audio_tagging_loss=0.009031, over 15080.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08884, pruned_loss=0.01247, audio_tagging_loss=0.009159, over 3049726.94 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:24:02,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3090020.0, ans=0.07 2023-11-25 22:24:22,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=22.5 2023-11-25 22:24:38,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3090286.6666666665, ans=0.1 2023-11-25 22:24:42,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-25 22:24:42,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3090286.6666666665, ans=0.125 2023-11-25 22:24:43,448 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463550 2023-11-25 22:24:49,276 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6650, loss[loss=0.0717, simple_loss=0.09638, pruned_loss=0.01337, audio_tagging_loss=0.01015, over 15191.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08882, pruned_loss=0.01241, audio_tagging_loss=0.009075, over 3050740.72 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:25:27,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.685e+01 9.321e+01 9.946e+01 1.416e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-25 22:25:38,715 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463600 2023-11-25 22:25:44,051 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6700, loss[loss=0.06473, simple_loss=0.08785, pruned_loss=0.01045, audio_tagging_loss=0.01035, over 15055.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08923, pruned_loss=0.01252, audio_tagging_loss=0.009073, over 3044527.88 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:25:53,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3090686.6666666665, ans=10.0 2023-11-25 22:26:10,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3090820.0, ans=0.125 2023-11-25 22:26:33,546 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463650 2023-11-25 22:26:38,700 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6750, loss[loss=0.06997, simple_loss=0.0991, pruned_loss=0.01324, audio_tagging_loss=0.007183, over 15598.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08991, pruned_loss=0.01262, audio_tagging_loss=0.009, over 3046470.08 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:26:42,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3091020.0, ans=0.0 2023-11-25 22:26:43,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3091020.0, ans=0.125 2023-11-25 22:26:51,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3091086.6666666665, ans=0.125 2023-11-25 22:27:17,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.613e+01 9.173e+01 9.716e+01 1.152e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-25 22:27:28,299 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463700 2023-11-25 22:27:33,913 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6800, loss[loss=0.05363, simple_loss=0.06508, pruned_loss=0.01021, audio_tagging_loss=0.01088, over 15074.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08953, pruned_loss=0.01261, audio_tagging_loss=0.009061, over 3037979.99 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:27:39,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3091353.3333333335, ans=0.0 2023-11-25 22:27:45,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3091420.0, ans=0.0 2023-11-25 22:28:11,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3091553.3333333335, ans=0.125 2023-11-25 22:28:23,726 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463750 2023-11-25 22:28:28,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3091686.6666666665, ans=0.125 2023-11-25 22:28:28,854 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6850, loss[loss=0.06716, simple_loss=0.09228, pruned_loss=0.01138, audio_tagging_loss=0.009636, over 14862.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08933, pruned_loss=0.01259, audio_tagging_loss=0.008962, over 3031909.50 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:28:30,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3091686.6666666665, ans=0.125 2023-11-25 22:28:38,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3091753.3333333335, ans=0.0 2023-11-25 22:28:39,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3091753.3333333335, ans=0.0 2023-11-25 22:28:47,402 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:29:06,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=8.0 2023-11-25 22:29:07,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.919e+01 8.654e+01 9.393e+01 1.015e+02 1.220e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:29:18,078 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463800 2023-11-25 22:29:22,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3092020.0, ans=0.125 2023-11-25 22:29:23,602 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6900, loss[loss=0.0777, simple_loss=0.1168, pruned_loss=0.01195, audio_tagging_loss=0.007354, over 15602.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08964, pruned_loss=0.01272, audio_tagging_loss=0.008936, over 3036704.38 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:29:28,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3092020.0, ans=0.0 2023-11-25 22:29:40,332 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:29:44,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3092086.6666666665, ans=0.125 2023-11-25 22:29:48,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3092153.3333333335, ans=0.0 2023-11-25 22:29:49,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3092153.3333333335, ans=0.1 2023-11-25 22:29:58,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092220.0, ans=0.1 2023-11-25 22:29:58,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3092220.0, ans=0.07 2023-11-25 22:30:03,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092220.0, ans=0.1 2023-11-25 22:30:08,454 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:30:13,783 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463850 2023-11-25 22:30:20,064 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6950, loss[loss=0.07264, simple_loss=0.09868, pruned_loss=0.0164, audio_tagging_loss=0.006896, over 15147.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.08959, pruned_loss=0.01283, audio_tagging_loss=0.008997, over 3030888.36 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:30:26,231 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2023-11-25 22:30:39,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.57 vs. limit=10.0 2023-11-25 22:30:42,781 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:30:48,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3092486.6666666665, ans=0.0 2023-11-25 22:30:58,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.697e+01 9.205e+01 9.794e+01 1.442e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-25 22:31:01,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3092553.3333333335, ans=0.09899494936611666 2023-11-25 22:31:09,791 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463900 2023-11-25 22:31:15,076 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7000, loss[loss=0.06787, simple_loss=0.09008, pruned_loss=0.01479, audio_tagging_loss=0.008038, over 15156.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08963, pruned_loss=0.01278, audio_tagging_loss=0.009044, over 3032908.83 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:31:18,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092686.6666666665, ans=0.1 2023-11-25 22:31:22,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3092686.6666666665, ans=0.125 2023-11-25 22:31:37,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3092820.0, ans=0.125 2023-11-25 22:31:57,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=15.0 2023-11-25 22:32:04,321 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463950 2023-11-25 22:32:09,451 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7050, loss[loss=0.0671, simple_loss=0.08898, pruned_loss=0.01395, audio_tagging_loss=0.008659, over 16048.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08906, pruned_loss=0.01271, audio_tagging_loss=0.00922, over 3027736.72 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:32:48,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.460e+01 9.019e+01 9.979e+01 1.338e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-25 22:32:50,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-25 22:32:58,700 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464000 2023-11-25 22:33:07,446 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7100, loss[loss=0.05632, simple_loss=0.06875, pruned_loss=0.009757, audio_tagging_loss=0.01219, over 15266.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.0893, pruned_loss=0.01269, audio_tagging_loss=0.009343, over 3031051.43 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:33:32,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-11-25 22:33:42,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3093553.3333333335, ans=0.0 2023-11-25 22:33:46,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3093553.3333333335, ans=0.125 2023-11-25 22:33:53,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-11-25 22:33:57,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464050 2023-11-25 22:33:59,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3093620.0, ans=0.125 2023-11-25 22:34:02,474 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7150, loss[loss=0.0769, simple_loss=0.107, pruned_loss=0.01574, audio_tagging_loss=0.007635, over 16105.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.0901, pruned_loss=0.01268, audio_tagging_loss=0.009202, over 3037376.65 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:34:25,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3093820.0, ans=0.125 2023-11-25 22:34:32,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3093820.0, ans=0.2 2023-11-25 22:34:33,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3093820.0, ans=0.09899494936611666 2023-11-25 22:34:40,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.669e+01 9.271e+01 1.002e+02 1.351e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-25 22:34:51,295 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464100 2023-11-25 22:34:56,611 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7200, loss[loss=0.05434, simple_loss=0.0653, pruned_loss=0.009922, audio_tagging_loss=0.01177, over 13959.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.08986, pruned_loss=0.01265, audio_tagging_loss=0.009334, over 3038391.94 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:34:57,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3094020.0, ans=0.125 2023-11-25 22:34:59,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3094020.0, ans=0.2 2023-11-25 22:35:13,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3094086.6666666665, ans=0.1 2023-11-25 22:35:33,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3094220.0, ans=0.125 2023-11-25 22:35:34,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3094220.0, ans=0.1 2023-11-25 22:35:45,924 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464150 2023-11-25 22:35:51,610 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7250, loss[loss=0.05974, simple_loss=0.08058, pruned_loss=0.01151, audio_tagging_loss=0.007941, over 15945.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.0889, pruned_loss=0.01243, audio_tagging_loss=0.009441, over 3038807.18 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:35:57,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3094353.3333333335, ans=0.0 2023-11-25 22:36:02,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.44 vs. limit=10.0 2023-11-25 22:36:06,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-25 22:36:11,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3094420.0, ans=0.0 2023-11-25 22:36:24,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3094553.3333333335, ans=0.2 2023-11-25 22:36:31,152 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.827e+01 9.307e+01 1.005e+02 1.461e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-25 22:36:39,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3094620.0, ans=0.2 2023-11-25 22:36:42,606 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464200 2023-11-25 22:36:48,028 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7300, loss[loss=0.06786, simple_loss=0.09671, pruned_loss=0.01277, audio_tagging_loss=0.006732, over 15053.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08899, pruned_loss=0.01235, audio_tagging_loss=0.009335, over 3037644.54 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:36:56,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.16 vs. limit=10.0 2023-11-25 22:37:04,344 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.28 vs. limit=10.0 2023-11-25 22:37:06,915 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:37:20,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3094886.6666666665, ans=0.0 2023-11-25 22:37:24,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3094886.6666666665, ans=0.1 2023-11-25 22:37:34,582 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-25 22:37:37,122 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464250 2023-11-25 22:37:41,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3095020.0, ans=0.0 2023-11-25 22:37:42,345 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7350, loss[loss=0.06967, simple_loss=0.09923, pruned_loss=0.01273, audio_tagging_loss=0.007322, over 13390.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09058, pruned_loss=0.01267, audio_tagging_loss=0.009162, over 3043200.02 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:37:48,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3095020.0, ans=0.2 2023-11-25 22:37:56,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3095086.6666666665, ans=10.0 2023-11-25 22:37:56,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3095086.6666666665, ans=0.1 2023-11-25 22:38:02,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-25 22:38:06,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=3095153.3333333335, ans=12.0 2023-11-25 22:38:23,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.376e+01 8.699e+01 9.551e+01 1.020e+02 2.458e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-25 22:38:30,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3095286.6666666665, ans=0.125 2023-11-25 22:38:31,654 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464300 2023-11-25 22:38:36,925 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7400, loss[loss=0.07777, simple_loss=0.1095, pruned_loss=0.01461, audio_tagging_loss=0.008432, over 15503.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09036, pruned_loss=0.01262, audio_tagging_loss=0.00904, over 3051289.68 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:38:41,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2023-11-25 22:38:48,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3095420.0, ans=0.125 2023-11-25 22:39:02,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3095486.6666666665, ans=0.125 2023-11-25 22:39:23,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3095620.0, ans=0.1 2023-11-25 22:39:26,759 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464350 2023-11-25 22:39:27,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3095620.0, ans=0.125 2023-11-25 22:39:32,916 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7450, loss[loss=0.05359, simple_loss=0.06922, pruned_loss=0.01096, audio_tagging_loss=0.008026, over 14801.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09013, pruned_loss=0.01259, audio_tagging_loss=0.009002, over 3049243.38 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:39:34,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3095686.6666666665, ans=0.0 2023-11-25 22:39:54,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3095820.0, ans=10.0 2023-11-25 22:39:59,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095820.0, ans=0.1 2023-11-25 22:40:13,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.803e+01 9.393e+01 1.013e+02 1.307e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:40:18,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3095953.3333333335, ans=0.125 2023-11-25 22:40:21,922 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464400 2023-11-25 22:40:27,441 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7500, loss[loss=0.07416, simple_loss=0.09256, pruned_loss=0.0154, audio_tagging_loss=0.01248, over 14785.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09044, pruned_loss=0.01262, audio_tagging_loss=0.00896, over 3052993.77 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:40:28,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096020.0, ans=0.1 2023-11-25 22:40:41,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096086.6666666665, ans=0.1 2023-11-25 22:40:48,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096153.3333333335, ans=0.1 2023-11-25 22:41:11,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3096286.6666666665, ans=0.0 2023-11-25 22:41:16,954 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464450 2023-11-25 22:41:22,304 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7550, loss[loss=0.07151, simple_loss=0.102, pruned_loss=0.01348, audio_tagging_loss=0.007032, over 14382.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09056, pruned_loss=0.01289, audio_tagging_loss=0.008959, over 3049785.52 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:41:23,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-25 22:41:37,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3096420.0, ans=0.125 2023-11-25 22:41:38,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3096420.0, ans=0.0 2023-11-25 22:42:02,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.821e+01 8.728e+01 9.410e+01 1.018e+02 1.180e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-25 22:42:12,502 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464500 2023-11-25 22:42:13,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2023-11-25 22:42:18,117 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7600, loss[loss=0.0728, simple_loss=0.1007, pruned_loss=0.0163, audio_tagging_loss=0.006165, over 15653.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09103, pruned_loss=0.013, audio_tagging_loss=0.008871, over 3048614.54 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:42:26,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-25 22:42:49,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3096886.6666666665, ans=0.125 2023-11-25 22:42:52,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3096886.6666666665, ans=0.125 2023-11-25 22:42:59,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3096886.6666666665, ans=0.125 2023-11-25 22:43:04,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3096953.3333333335, ans=0.0 2023-11-25 22:43:07,835 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464550 2023-11-25 22:43:13,185 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7650, loss[loss=0.06775, simple_loss=0.1015, pruned_loss=0.01036, audio_tagging_loss=0.006622, over 15495.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09095, pruned_loss=0.01292, audio_tagging_loss=0.008825, over 3048342.72 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:43:13,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3097020.0, ans=0.2 2023-11-25 22:43:22,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3097086.6666666665, ans=0.0 2023-11-25 22:43:25,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3097086.6666666665, ans=0.2 2023-11-25 22:43:29,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3097086.6666666665, ans=0.125 2023-11-25 22:43:55,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.616e+01 9.118e+01 9.857e+01 1.270e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-25 22:43:59,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3097286.6666666665, ans=0.125 2023-11-25 22:44:02,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464600 2023-11-25 22:44:05,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3097286.6666666665, ans=0.125 2023-11-25 22:44:08,190 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7700, loss[loss=0.06038, simple_loss=0.08084, pruned_loss=0.009966, audio_tagging_loss=0.009989, over 14550.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09059, pruned_loss=0.01289, audio_tagging_loss=0.008886, over 3044380.60 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:44:13,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3097353.3333333335, ans=0.0 2023-11-25 22:44:36,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.07 vs. limit=10.0 2023-11-25 22:44:36,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097486.6666666665, ans=0.1 2023-11-25 22:44:36,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3097486.6666666665, ans=0.125 2023-11-25 22:44:38,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3097486.6666666665, ans=15.0 2023-11-25 22:44:54,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3097620.0, ans=0.125 2023-11-25 22:44:58,671 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464650 2023-11-25 22:45:01,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=22.5 2023-11-25 22:45:04,329 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7750, loss[loss=0.04924, simple_loss=0.06049, pruned_loss=0.008009, audio_tagging_loss=0.01099, over 15474.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09081, pruned_loss=0.01285, audio_tagging_loss=0.008979, over 3050320.14 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:45:12,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3097686.6666666665, ans=0.125 2023-11-25 22:45:15,504 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2023-11-25 22:45:19,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2023-11-25 22:45:35,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3097886.6666666665, ans=0.04949747468305833 2023-11-25 22:45:41,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3097886.6666666665, ans=0.2 2023-11-25 22:45:46,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.755e+01 9.240e+01 9.987e+01 1.306e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-25 22:45:53,710 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464700 2023-11-25 22:45:54,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-25 22:45:59,383 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7800, loss[loss=0.07785, simple_loss=0.1102, pruned_loss=0.01394, audio_tagging_loss=0.008788, over 14348.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09178, pruned_loss=0.013, audio_tagging_loss=0.009031, over 3055828.08 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:46:05,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.72 vs. limit=15.0 2023-11-25 22:46:11,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3098086.6666666665, ans=0.0 2023-11-25 22:46:18,496 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:46:25,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3098153.3333333335, ans=0.2 2023-11-25 22:46:28,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3098153.3333333335, ans=0.1 2023-11-25 22:46:44,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-25 22:46:46,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3098286.6666666665, ans=0.1 2023-11-25 22:46:47,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-25 22:46:48,762 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464750 2023-11-25 22:46:50,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-25 22:46:54,019 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7850, loss[loss=0.06485, simple_loss=0.09284, pruned_loss=0.01192, audio_tagging_loss=0.006504, over 15894.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09185, pruned_loss=0.01309, audio_tagging_loss=0.009118, over 3057860.02 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:47:28,622 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:47:28,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=22.5 2023-11-25 22:47:35,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.791e+01 9.341e+01 1.014e+02 1.334e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-25 22:47:43,164 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464800 2023-11-25 22:47:49,381 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7900, loss[loss=0.07196, simple_loss=0.09155, pruned_loss=0.01534, audio_tagging_loss=0.01085, over 15512.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09205, pruned_loss=0.01317, audio_tagging_loss=0.009172, over 3063314.50 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:47:56,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3098686.6666666665, ans=0.0 2023-11-25 22:48:02,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3098753.3333333335, ans=0.125 2023-11-25 22:48:14,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3098820.0, ans=0.125 2023-11-25 22:48:18,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-25 22:48:33,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-11-25 22:48:39,196 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464850 2023-11-25 22:48:44,372 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7950, loss[loss=0.05185, simple_loss=0.06479, pruned_loss=0.009293, audio_tagging_loss=0.01016, over 15329.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09132, pruned_loss=0.01298, audio_tagging_loss=0.009229, over 3053755.43 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:48:51,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.73 vs. limit=15.0 2023-11-25 22:48:57,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3099086.6666666665, ans=0.125 2023-11-25 22:48:58,487 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:49:16,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2023-11-25 22:49:18,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3099220.0, ans=0.1 2023-11-25 22:49:18,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2023-11-25 22:49:23,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3099220.0, ans=0.125 2023-11-25 22:49:24,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.58 vs. limit=6.0 2023-11-25 22:49:25,786 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2023-11-25 22:49:26,274 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.696e+01 9.334e+01 1.006e+02 1.500e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-25 22:49:28,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3099286.6666666665, ans=0.1 2023-11-25 22:49:28,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3099286.6666666665, ans=0.0 2023-11-25 22:49:34,214 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464900 2023-11-25 22:49:36,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-11-25 22:49:39,375 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8000, loss[loss=0.06828, simple_loss=0.08984, pruned_loss=0.01448, audio_tagging_loss=0.008877, over 15401.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.0906, pruned_loss=0.01298, audio_tagging_loss=0.009356, over 3055147.49 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:50:01,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3099486.6666666665, ans=0.125 2023-11-25 22:50:11,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3099553.3333333335, ans=0.0 2023-11-25 22:50:28,795 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464950 2023-11-25 22:50:34,964 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8050, loss[loss=0.06238, simple_loss=0.08192, pruned_loss=0.01347, audio_tagging_loss=0.007953, over 15040.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09053, pruned_loss=0.01298, audio_tagging_loss=0.009357, over 3054230.64 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:50:50,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3099753.3333333335, ans=0.125 2023-11-25 22:50:57,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3099820.0, ans=0.0 2023-11-25 22:51:16,986 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.622e+01 9.226e+01 9.839e+01 1.205e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-25 22:51:20,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3099953.3333333335, ans=0.0 2023-11-25 22:51:24,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3099953.3333333335, ans=0.1 2023-11-25 22:51:24,900 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465000 2023-11-25 22:51:27,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3099953.3333333335, ans=0.2 2023-11-25 22:51:29,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-11-25 22:51:30,381 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8100, loss[loss=0.08241, simple_loss=0.1003, pruned_loss=0.02085, audio_tagging_loss=0.01143, over 15895.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09138, pruned_loss=0.01324, audio_tagging_loss=0.009356, over 3048564.70 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:51:38,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2023-11-25 22:52:19,622 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465050 2023-11-25 22:52:24,794 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8150, loss[loss=0.06032, simple_loss=0.08232, pruned_loss=0.01027, audio_tagging_loss=0.008888, over 15874.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09051, pruned_loss=0.01294, audio_tagging_loss=0.009188, over 3046302.14 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:52:32,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3100353.3333333335, ans=0.125 2023-11-25 22:52:39,654 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:52:39,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3100420.0, ans=0.125 2023-11-25 22:52:59,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3100553.3333333335, ans=0.125 2023-11-25 22:53:06,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.506e+01 9.069e+01 1.015e+02 1.632e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-25 22:53:07,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2023-11-25 22:53:07,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3100620.0, ans=0.125 2023-11-25 22:53:14,726 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465100 2023-11-25 22:53:16,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3100620.0, ans=0.2 2023-11-25 22:53:20,590 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8200, loss[loss=0.07312, simple_loss=0.11, pruned_loss=0.01072, audio_tagging_loss=0.007377, over 16305.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09005, pruned_loss=0.01283, audio_tagging_loss=0.009149, over 3038150.96 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:53:23,155 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:53:35,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3100753.3333333335, ans=0.125 2023-11-25 22:53:35,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3100753.3333333335, ans=0.1 2023-11-25 22:53:49,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3100820.0, ans=0.125 2023-11-25 22:53:53,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3100886.6666666665, ans=0.0 2023-11-25 22:53:57,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3100886.6666666665, ans=0.1 2023-11-25 22:54:05,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.87 vs. limit=10.0 2023-11-25 22:54:11,018 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465150 2023-11-25 22:54:14,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3100953.3333333335, ans=0.0 2023-11-25 22:54:16,191 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8250, loss[loss=0.08056, simple_loss=0.1151, pruned_loss=0.01541, audio_tagging_loss=0.007582, over 16246.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.08951, pruned_loss=0.01274, audio_tagging_loss=0.009169, over 3046149.51 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:54:39,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3101153.3333333335, ans=0.125 2023-11-25 22:54:58,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.606e+01 9.259e+01 1.021e+02 1.240e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-25 22:55:04,737 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465200 2023-11-25 22:55:10,184 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8300, loss[loss=0.07172, simple_loss=0.08592, pruned_loss=0.01927, audio_tagging_loss=0.009493, over 15287.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08925, pruned_loss=0.01268, audio_tagging_loss=0.009173, over 3044264.62 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:55:14,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3101353.3333333335, ans=0.125 2023-11-25 22:55:17,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3101353.3333333335, ans=0.125 2023-11-25 22:55:58,983 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465250 2023-11-25 22:56:04,593 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8350, loss[loss=0.06667, simple_loss=0.09538, pruned_loss=0.01054, audio_tagging_loss=0.008443, over 14898.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08943, pruned_loss=0.01256, audio_tagging_loss=0.009103, over 3052565.90 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:56:16,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3101753.3333333335, ans=0.0 2023-11-25 22:56:21,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3101753.3333333335, ans=0.125 2023-11-25 22:56:23,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3101753.3333333335, ans=0.2 2023-11-25 22:56:43,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=22.5 2023-11-25 22:56:46,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.512e+01 9.293e+01 1.012e+02 1.242e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 22:56:49,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-25 22:56:54,225 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465300 2023-11-25 22:56:59,882 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8400, loss[loss=0.06705, simple_loss=0.09318, pruned_loss=0.01129, audio_tagging_loss=0.009162, over 14182.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08841, pruned_loss=0.01224, audio_tagging_loss=0.009023, over 3051282.94 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:57:11,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3102086.6666666665, ans=0.2 2023-11-25 22:57:20,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2023-11-25 22:57:25,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3102153.3333333335, ans=0.95 2023-11-25 22:57:26,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3102153.3333333335, ans=0.0 2023-11-25 22:57:35,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2023-11-25 22:57:36,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3102220.0, ans=0.2 2023-11-25 22:57:48,444 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465350 2023-11-25 22:57:48,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3102286.6666666665, ans=0.1 2023-11-25 22:57:53,638 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8450, loss[loss=0.04832, simple_loss=0.06453, pruned_loss=0.009007, audio_tagging_loss=0.007049, over 14062.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08943, pruned_loss=0.01241, audio_tagging_loss=0.008975, over 3052064.95 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:58:22,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3102486.6666666665, ans=0.0 2023-11-25 22:58:35,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3102553.3333333335, ans=0.0 2023-11-25 22:58:35,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.915e+01 9.393e+01 9.975e+01 1.301e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:58:42,283 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465400 2023-11-25 22:58:42,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3102620.0, ans=0.125 2023-11-25 22:58:43,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3102620.0, ans=0.2 2023-11-25 22:58:47,845 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8500, loss[loss=0.08233, simple_loss=0.102, pruned_loss=0.02359, audio_tagging_loss=0.007759, over 14872.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08962, pruned_loss=0.01261, audio_tagging_loss=0.00896, over 3046825.68 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:59:04,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3102753.3333333335, ans=0.2 2023-11-25 22:59:16,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3102820.0, ans=0.035 2023-11-25 22:59:31,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.09 vs. limit=10.0 2023-11-25 22:59:31,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-25 22:59:37,815 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465450 2023-11-25 22:59:41,185 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:59:43,588 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8550, loss[loss=0.0629, simple_loss=0.08152, pruned_loss=0.01202, audio_tagging_loss=0.01012, over 14559.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08962, pruned_loss=0.01249, audio_tagging_loss=0.009001, over 3045508.51 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:59:54,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3103086.6666666665, ans=0.2 2023-11-25 22:59:58,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3103086.6666666665, ans=0.125 2023-11-25 23:00:13,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3103220.0, ans=0.0 2023-11-25 23:00:25,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.599e+01 9.050e+01 9.776e+01 1.276e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-25 23:00:29,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3103286.6666666665, ans=0.0 2023-11-25 23:00:32,216 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465500 2023-11-25 23:00:33,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3103286.6666666665, ans=0.125 2023-11-25 23:00:37,339 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8600, loss[loss=0.06087, simple_loss=0.08453, pruned_loss=0.01053, audio_tagging_loss=0.008077, over 15382.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08964, pruned_loss=0.01251, audio_tagging_loss=0.009081, over 3042386.19 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:00:45,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3103353.3333333335, ans=0.125 2023-11-25 23:00:56,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3103420.0, ans=0.125 2023-11-25 23:01:01,323 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-11-25 23:01:06,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3103486.6666666665, ans=0.1 2023-11-25 23:01:09,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3103553.3333333335, ans=0.125 2023-11-25 23:01:26,005 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465550 2023-11-25 23:01:28,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3103620.0, ans=0.125 2023-11-25 23:01:31,130 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8650, loss[loss=0.09267, simple_loss=0.1276, pruned_loss=0.02117, audio_tagging_loss=0.007702, over 15442.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08992, pruned_loss=0.01243, audio_tagging_loss=0.009184, over 3040475.35 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:01:33,661 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-25 23:01:40,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3103686.6666666665, ans=0.125 2023-11-25 23:01:48,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-11-25 23:01:58,181 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:02:13,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.616e+01 9.272e+01 9.852e+01 1.304e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-25 23:02:20,396 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465600 2023-11-25 23:02:25,780 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8700, loss[loss=0.04823, simple_loss=0.05384, pruned_loss=0.00698, audio_tagging_loss=0.01433, over 15307.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09093, pruned_loss=0.01259, audio_tagging_loss=0.009231, over 3042486.42 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:02:32,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3104020.0, ans=0.1 2023-11-25 23:02:47,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3104153.3333333335, ans=0.125 2023-11-25 23:03:15,438 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465650 2023-11-25 23:03:19,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-25 23:03:20,587 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8750, loss[loss=0.04399, simple_loss=0.06017, pruned_loss=0.004702, audio_tagging_loss=0.009198, over 15687.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09122, pruned_loss=0.01278, audio_tagging_loss=0.009262, over 3040515.68 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:03:23,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3104353.3333333335, ans=0.1 2023-11-25 23:03:29,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-25 23:03:30,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3104420.0, ans=0.125 2023-11-25 23:03:30,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3104420.0, ans=0.0 2023-11-25 23:03:30,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3104420.0, ans=0.5 2023-11-25 23:03:37,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3104420.0, ans=0.125 2023-11-25 23:03:38,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3104420.0, ans=0.0 2023-11-25 23:03:40,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104486.6666666665, ans=0.1 2023-11-25 23:04:03,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.696e+01 9.362e+01 9.858e+01 1.375e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-25 23:04:04,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-25 23:04:08,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104620.0, ans=0.1 2023-11-25 23:04:09,616 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465700 2023-11-25 23:04:13,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-25 23:04:14,838 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8800, loss[loss=0.04389, simple_loss=0.05438, pruned_loss=0.007443, audio_tagging_loss=0.009258, over 15197.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.0908, pruned_loss=0.01251, audio_tagging_loss=0.009277, over 3039683.42 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:04:53,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3104886.6666666665, ans=0.125 2023-11-25 23:04:56,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2023-11-25 23:05:04,343 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465750 2023-11-25 23:05:10,608 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8850, loss[loss=0.06785, simple_loss=0.09367, pruned_loss=0.01405, audio_tagging_loss=0.00697, over 15589.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09036, pruned_loss=0.01264, audio_tagging_loss=0.009256, over 3047199.23 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:05:22,601 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.68 vs. limit=6.0 2023-11-25 23:05:23,125 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:05:26,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3105086.6666666665, ans=0.0 2023-11-25 23:05:44,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3105220.0, ans=0.125 2023-11-25 23:05:49,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3105220.0, ans=0.2 2023-11-25 23:05:53,528 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.482e+01 9.169e+01 1.001e+02 1.243e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-25 23:05:59,562 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:06:00,495 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465800 2023-11-25 23:06:06,437 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8900, loss[loss=0.07968, simple_loss=0.1108, pruned_loss=0.01808, audio_tagging_loss=0.006199, over 15265.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.08993, pruned_loss=0.01259, audio_tagging_loss=0.009123, over 3052577.91 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:06:07,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3105353.3333333335, ans=0.0 2023-11-25 23:06:09,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3105353.3333333335, ans=0.1 2023-11-25 23:06:10,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3105353.3333333335, ans=0.04949747468305833 2023-11-25 23:06:15,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3105353.3333333335, ans=0.05 2023-11-25 23:06:33,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3105486.6666666665, ans=0.1 2023-11-25 23:06:33,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3105486.6666666665, ans=0.125 2023-11-25 23:06:43,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3105553.3333333335, ans=0.0 2023-11-25 23:06:45,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3105553.3333333335, ans=0.125 2023-11-25 23:06:55,578 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465850 2023-11-25 23:07:00,853 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8950, loss[loss=0.05531, simple_loss=0.07436, pruned_loss=0.009672, audio_tagging_loss=0.008455, over 15175.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09098, pruned_loss=0.01272, audio_tagging_loss=0.008903, over 3051527.62 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:07:01,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-25 23:07:15,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3105753.3333333335, ans=0.125 2023-11-25 23:07:19,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3105753.3333333335, ans=15.0 2023-11-25 23:07:37,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3105886.6666666665, ans=0.125 2023-11-25 23:07:43,925 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.637e+01 9.614e+01 1.032e+02 1.612e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-25 23:07:50,274 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465900 2023-11-25 23:07:56,439 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9000, loss[loss=0.08596, simple_loss=0.1249, pruned_loss=0.0186, audio_tagging_loss=0.004916, over 15439.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09074, pruned_loss=0.01269, audio_tagging_loss=0.008886, over 3049923.53 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:07:56,439 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-25 23:08:18,271 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2827, 4.2485, 4.5121, 4.4582], device='cuda:2') 2023-11-25 23:08:26,185 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4714, 3.7819, 4.3685, 3.5845], device='cuda:2') 2023-11-25 23:08:28,213 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.05899, simple_loss=0.0507, pruned_loss=0.005227, audio_tagging_loss=0.02841, over 4681554.00 frames. 2023-11-25 23:08:28,214 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-25 23:08:41,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3106086.6666666665, ans=0.125 2023-11-25 23:09:05,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3106220.0, ans=0.125 2023-11-25 23:09:17,797 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465950 2023-11-25 23:09:22,963 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9050, loss[loss=0.09006, simple_loss=0.1164, pruned_loss=0.02326, audio_tagging_loss=0.0086, over 16107.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09037, pruned_loss=0.01262, audio_tagging_loss=0.008851, over 3047912.99 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:09:25,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3106353.3333333335, ans=0.1 2023-11-25 23:10:02,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-25 23:10:03,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3106553.3333333335, ans=0.0 2023-11-25 23:10:07,088 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.869e+01 9.445e+01 1.003e+02 1.420e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-25 23:10:08,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3106620.0, ans=0.125 2023-11-25 23:10:10,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3106620.0, ans=0.125 2023-11-25 23:10:12,425 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466000 2023-11-25 23:10:18,660 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9100, loss[loss=0.08864, simple_loss=0.1315, pruned_loss=0.01587, audio_tagging_loss=0.007032, over 15905.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09067, pruned_loss=0.01261, audio_tagging_loss=0.008771, over 3046193.86 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:10:36,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3106753.3333333335, ans=0.0 2023-11-25 23:11:08,263 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466050 2023-11-25 23:11:13,469 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9150, loss[loss=0.08058, simple_loss=0.1077, pruned_loss=0.01513, audio_tagging_loss=0.01162, over 16387.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09015, pruned_loss=0.01254, audio_tagging_loss=0.008824, over 3043256.85 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:11:19,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3107020.0, ans=0.125 2023-11-25 23:11:31,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3107086.6666666665, ans=0.125 2023-11-25 23:11:53,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3107220.0, ans=0.125 2023-11-25 23:11:56,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.490e+01 9.148e+01 9.794e+01 1.489e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-25 23:12:02,212 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466100 2023-11-25 23:12:07,884 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9200, loss[loss=0.08042, simple_loss=0.1074, pruned_loss=0.01755, audio_tagging_loss=0.009161, over 14950.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08989, pruned_loss=0.0126, audio_tagging_loss=0.00889, over 3032659.33 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:12:16,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3107353.3333333335, ans=0.125 2023-11-25 23:12:29,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3107486.6666666665, ans=0.035 2023-11-25 23:12:36,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3107486.6666666665, ans=0.1 2023-11-25 23:12:39,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3107553.3333333335, ans=0.0 2023-11-25 23:12:41,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3107553.3333333335, ans=0.2 2023-11-25 23:12:45,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-25 23:12:57,032 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466150 2023-11-25 23:12:58,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3107620.0, ans=10.0 2023-11-25 23:13:03,220 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9250, loss[loss=0.07326, simple_loss=0.106, pruned_loss=0.01219, audio_tagging_loss=0.008074, over 16035.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09047, pruned_loss=0.01276, audio_tagging_loss=0.008843, over 3035068.41 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:13:05,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-11-25 23:13:17,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3107753.3333333335, ans=0.2 2023-11-25 23:13:39,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3107886.6666666665, ans=0.125 2023-11-25 23:13:46,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.554e+01 9.246e+01 1.012e+02 1.216e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 23:13:52,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466200 2023-11-25 23:13:58,064 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9300, loss[loss=0.09222, simple_loss=0.1279, pruned_loss=0.01957, audio_tagging_loss=0.008683, over 14700.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09025, pruned_loss=0.01272, audio_tagging_loss=0.008851, over 3037347.27 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:14:17,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3108086.6666666665, ans=0.2 2023-11-25 23:14:18,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3108153.3333333335, ans=0.125 2023-11-25 23:14:23,948 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2023-11-25 23:14:29,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3108220.0, ans=0.125 2023-11-25 23:14:30,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3108220.0, ans=0.2 2023-11-25 23:14:34,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3108220.0, ans=0.125 2023-11-25 23:14:46,890 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466250 2023-11-25 23:14:50,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3108286.6666666665, ans=0.2 2023-11-25 23:14:52,149 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9350, loss[loss=0.05928, simple_loss=0.08225, pruned_loss=0.01019, audio_tagging_loss=0.007966, over 14735.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09024, pruned_loss=0.01283, audio_tagging_loss=0.008898, over 3028421.80 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:15:08,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3108420.0, ans=0.04949747468305833 2023-11-25 23:15:28,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3108553.3333333335, ans=0.5 2023-11-25 23:15:36,369 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.512e+01 9.083e+01 9.779e+01 1.171e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-25 23:15:41,148 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466300 2023-11-25 23:15:46,817 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9400, loss[loss=0.05154, simple_loss=0.07512, pruned_loss=0.005822, audio_tagging_loss=0.008162, over 15566.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08976, pruned_loss=0.01268, audio_tagging_loss=0.009055, over 3034277.13 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:15:52,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3108686.6666666665, ans=0.125 2023-11-25 23:15:59,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-25 23:16:00,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3108753.3333333335, ans=15.0 2023-11-25 23:16:02,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3108753.3333333335, ans=0.125 2023-11-25 23:16:35,600 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466350 2023-11-25 23:16:41,287 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9450, loss[loss=0.06648, simple_loss=0.07449, pruned_loss=0.01802, audio_tagging_loss=0.01121, over 15212.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09005, pruned_loss=0.01276, audio_tagging_loss=0.009036, over 3035464.54 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:16:42,346 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:16:48,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-11-25 23:16:49,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3109020.0, ans=0.125 2023-11-25 23:16:52,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109086.6666666665, ans=0.1 2023-11-25 23:17:04,427 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2023-11-25 23:17:25,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.505e+01 9.184e+01 9.882e+01 1.417e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-25 23:17:28,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3109286.6666666665, ans=0.2 2023-11-25 23:17:30,174 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466400 2023-11-25 23:17:35,699 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9500, loss[loss=0.04957, simple_loss=0.05993, pruned_loss=0.009263, audio_tagging_loss=0.01034, over 15874.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09028, pruned_loss=0.01266, audio_tagging_loss=0.009103, over 3042047.44 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:17:53,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3109420.0, ans=0.0 2023-11-25 23:17:53,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-25 23:17:58,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3109486.6666666665, ans=0.09899494936611666 2023-11-25 23:18:02,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3109486.6666666665, ans=0.125 2023-11-25 23:18:24,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466450 2023-11-25 23:18:30,750 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9550, loss[loss=0.06751, simple_loss=0.09893, pruned_loss=0.01147, audio_tagging_loss=0.006573, over 15555.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.0903, pruned_loss=0.01244, audio_tagging_loss=0.009203, over 3049269.30 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:18:46,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-25 23:18:46,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-25 23:19:12,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3109886.6666666665, ans=0.125 2023-11-25 23:19:16,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.693e+01 9.287e+01 1.001e+02 1.223e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-25 23:19:19,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3109953.3333333335, ans=0.0 2023-11-25 23:19:20,376 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466500 2023-11-25 23:19:26,154 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9600, loss[loss=0.04743, simple_loss=0.05977, pruned_loss=0.006765, audio_tagging_loss=0.01078, over 15974.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09109, pruned_loss=0.01254, audio_tagging_loss=0.009183, over 3052921.56 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:19:29,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3110020.0, ans=0.025 2023-11-25 23:19:30,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3110020.0, ans=0.07 2023-11-25 23:19:55,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3110153.3333333335, ans=0.125 2023-11-25 23:20:10,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3110286.6666666665, ans=0.125 2023-11-25 23:20:12,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3110286.6666666665, ans=0.125 2023-11-25 23:20:14,825 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466550 2023-11-25 23:20:20,015 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9650, loss[loss=0.05697, simple_loss=0.07597, pruned_loss=0.009152, audio_tagging_loss=0.009839, over 15861.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09048, pruned_loss=0.01257, audio_tagging_loss=0.009171, over 3044435.23 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:20:20,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3110353.3333333335, ans=0.2 2023-11-25 23:20:25,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3110353.3333333335, ans=0.125 2023-11-25 23:20:29,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3110420.0, ans=0.125 2023-11-25 23:21:04,537 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2023-11-25 23:21:05,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 8.886e+01 9.411e+01 1.006e+02 1.308e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-25 23:21:09,293 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466600 2023-11-25 23:21:13,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3110686.6666666665, ans=0.0 2023-11-25 23:21:14,686 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9700, loss[loss=0.04784, simple_loss=0.05771, pruned_loss=0.0104, audio_tagging_loss=0.008587, over 14795.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09075, pruned_loss=0.01276, audio_tagging_loss=0.008993, over 3046640.86 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:21:38,506 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2023-11-25 23:21:46,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3110820.0, ans=0.2 2023-11-25 23:21:46,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3110820.0, ans=0.0 2023-11-25 23:21:48,582 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2023-11-25 23:22:01,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3110953.3333333335, ans=0.125 2023-11-25 23:22:04,901 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466650 2023-11-25 23:22:07,698 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:22:11,104 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9750, loss[loss=0.05508, simple_loss=0.07539, pruned_loss=0.008895, audio_tagging_loss=0.008491, over 15812.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.0909, pruned_loss=0.01283, audio_tagging_loss=0.008898, over 3049443.68 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:22:15,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3111020.0, ans=0.0 2023-11-25 23:22:17,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3111020.0, ans=0.95 2023-11-25 23:22:19,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3111020.0, ans=0.0 2023-11-25 23:22:19,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3111020.0, ans=0.125 2023-11-25 23:22:35,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3111153.3333333335, ans=0.125 2023-11-25 23:22:38,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3111153.3333333335, ans=0.0 2023-11-25 23:22:42,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3111220.0, ans=0.125 2023-11-25 23:22:51,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3111220.0, ans=0.125 2023-11-25 23:22:57,246 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.598e+01 9.280e+01 1.031e+02 1.262e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 23:23:00,451 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466700 2023-11-25 23:23:05,698 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9800, loss[loss=0.06619, simple_loss=0.084, pruned_loss=0.01253, audio_tagging_loss=0.01166, over 14816.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.0904, pruned_loss=0.01277, audio_tagging_loss=0.00887, over 3045717.86 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:23:08,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-25 23:23:19,879 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.79 vs. limit=10.0 2023-11-25 23:23:27,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-25 23:23:30,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3111486.6666666665, ans=10.0 2023-11-25 23:23:39,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3111553.3333333335, ans=0.125 2023-11-25 23:23:50,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3111620.0, ans=0.0 2023-11-25 23:23:55,144 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466750 2023-11-25 23:23:56,121 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:24:00,427 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9850, loss[loss=0.05272, simple_loss=0.06721, pruned_loss=0.00848, audio_tagging_loss=0.01063, over 16662.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09063, pruned_loss=0.01268, audio_tagging_loss=0.008804, over 3041565.15 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:24:09,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3111686.6666666665, ans=0.2 2023-11-25 23:24:28,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3111820.0, ans=0.125 2023-11-25 23:24:32,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3111886.6666666665, ans=0.0 2023-11-25 23:24:35,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3111886.6666666665, ans=0.09899494936611666 2023-11-25 23:24:36,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3111886.6666666665, ans=0.0 2023-11-25 23:24:37,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111886.6666666665, ans=0.1 2023-11-25 23:24:45,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.652e+01 9.205e+01 1.019e+02 1.596e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-25 23:24:50,025 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466800 2023-11-25 23:24:52,589 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:24:54,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3112020.0, ans=0.1 2023-11-25 23:24:55,434 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9900, loss[loss=0.06735, simple_loss=0.09045, pruned_loss=0.01367, audio_tagging_loss=0.008452, over 14642.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09058, pruned_loss=0.0126, audio_tagging_loss=0.008769, over 3044862.39 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:24:57,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3112020.0, ans=0.0 2023-11-25 23:25:03,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-25 23:25:08,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3112086.6666666665, ans=0.1 2023-11-25 23:25:17,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3112153.3333333335, ans=0.0 2023-11-25 23:25:26,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2023-11-25 23:25:45,817 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466850 2023-11-25 23:25:49,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3112286.6666666665, ans=0.0 2023-11-25 23:25:51,037 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9950, loss[loss=0.06079, simple_loss=0.08236, pruned_loss=0.008964, audio_tagging_loss=0.01064, over 15979.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09039, pruned_loss=0.01265, audio_tagging_loss=0.008825, over 3045058.62 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:25:53,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-25 23:26:33,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3112553.3333333335, ans=0.07 2023-11-25 23:26:36,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3112620.0, ans=0.125 2023-11-25 23:26:37,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.531e+01 9.197e+01 9.885e+01 1.494e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-25 23:26:38,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-11-25 23:26:40,482 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466900 2023-11-25 23:26:45,724 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10000, loss[loss=0.08726, simple_loss=0.1272, pruned_loss=0.01681, audio_tagging_loss=0.00683, over 15520.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09066, pruned_loss=0.01268, audio_tagging_loss=0.008811, over 3046221.99 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:26:49,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3112686.6666666665, ans=0.125 2023-11-25 23:26:53,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3112686.6666666665, ans=0.125 2023-11-25 23:27:00,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-25 23:27:06,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-25 23:27:11,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3112820.0, ans=0.125 2023-11-25 23:27:16,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3112820.0, ans=0.125 2023-11-25 23:27:18,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.96 vs. limit=10.0 2023-11-25 23:27:23,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3112886.6666666665, ans=0.125 2023-11-25 23:27:34,929 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466950 2023-11-25 23:27:41,127 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10050, loss[loss=0.05078, simple_loss=0.06644, pruned_loss=0.007686, audio_tagging_loss=0.009871, over 15475.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08962, pruned_loss=0.01252, audio_tagging_loss=0.008889, over 3047429.15 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:27:58,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3113086.6666666665, ans=0.1 2023-11-25 23:28:28,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.554e+01 9.112e+01 9.756e+01 1.275e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-25 23:28:30,648 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467000 2023-11-25 23:28:36,529 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10100, loss[loss=0.06753, simple_loss=0.08617, pruned_loss=0.01344, audio_tagging_loss=0.01101, over 14528.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09055, pruned_loss=0.01273, audio_tagging_loss=0.008855, over 3053361.38 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:28:47,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=22.5 2023-11-25 23:29:12,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-11-25 23:29:22,923 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:29:25,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3113620.0, ans=0.0 2023-11-25 23:29:26,089 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467050 2023-11-25 23:29:26,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3113620.0, ans=0.125 2023-11-25 23:29:31,203 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10150, loss[loss=0.077, simple_loss=0.1081, pruned_loss=0.01556, audio_tagging_loss=0.007413, over 16109.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.0909, pruned_loss=0.01276, audio_tagging_loss=0.008891, over 3056396.43 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:29:34,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3113686.6666666665, ans=0.2 2023-11-25 23:29:50,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3113753.3333333335, ans=0.0 2023-11-25 23:29:55,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-25 23:29:58,933 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:30:07,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3113886.6666666665, ans=0.07 2023-11-25 23:30:16,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3113953.3333333335, ans=0.04949747468305833 2023-11-25 23:30:18,231 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.705e+01 9.387e+01 9.994e+01 1.374e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-25 23:30:20,447 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467100 2023-11-25 23:30:26,711 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10200, loss[loss=0.05587, simple_loss=0.07041, pruned_loss=0.009594, audio_tagging_loss=0.01107, over 14831.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09115, pruned_loss=0.01275, audio_tagging_loss=0.008919, over 3054591.63 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:30:31,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3114020.0, ans=0.0 2023-11-25 23:30:33,342 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2023-11-25 23:30:42,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-25 23:30:47,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3114153.3333333335, ans=0.05 2023-11-25 23:30:49,586 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:30:50,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3114153.3333333335, ans=0.125 2023-11-25 23:30:58,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3114220.0, ans=0.2 2023-11-25 23:31:15,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3114286.6666666665, ans=0.035 2023-11-25 23:31:16,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467150 2023-11-25 23:31:18,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3114286.6666666665, ans=0.0 2023-11-25 23:31:21,707 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10250, loss[loss=0.09451, simple_loss=0.1307, pruned_loss=0.02065, audio_tagging_loss=0.008489, over 16994.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09133, pruned_loss=0.01285, audio_tagging_loss=0.009113, over 3055521.87 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:31:38,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-25 23:31:43,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3114486.6666666665, ans=0.125 2023-11-25 23:31:50,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3114486.6666666665, ans=0.0 2023-11-25 23:32:03,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3114553.3333333335, ans=0.125 2023-11-25 23:32:08,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3114620.0, ans=0.0 2023-11-25 23:32:08,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.876e+01 9.394e+01 1.009e+02 1.335e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 23:32:11,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467200 2023-11-25 23:32:17,001 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10300, loss[loss=0.06227, simple_loss=0.08201, pruned_loss=0.01063, audio_tagging_loss=0.01063, over 14957.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09196, pruned_loss=0.01295, audio_tagging_loss=0.008967, over 3059005.88 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:32:19,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3114686.6666666665, ans=0.125 2023-11-25 23:32:30,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3114753.3333333335, ans=0.0 2023-11-25 23:32:34,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3114753.3333333335, ans=0.0 2023-11-25 23:32:37,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3114820.0, ans=0.0 2023-11-25 23:32:42,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3114820.0, ans=0.125 2023-11-25 23:32:45,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3114820.0, ans=0.1 2023-11-25 23:32:47,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3114820.0, ans=0.0 2023-11-25 23:32:51,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3114886.6666666665, ans=0.0 2023-11-25 23:32:58,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3114886.6666666665, ans=0.2 2023-11-25 23:33:02,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3114953.3333333335, ans=0.125 2023-11-25 23:33:06,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467250 2023-11-25 23:33:10,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3115020.0, ans=0.125 2023-11-25 23:33:11,799 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10350, loss[loss=0.05739, simple_loss=0.07184, pruned_loss=0.01068, audio_tagging_loss=0.01079, over 15082.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09205, pruned_loss=0.01295, audio_tagging_loss=0.00908, over 3056453.92 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:33:56,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3115286.6666666665, ans=0.125 2023-11-25 23:33:59,132 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.695e+01 9.211e+01 9.915e+01 1.210e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-25 23:34:01,281 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467300 2023-11-25 23:34:02,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3115286.6666666665, ans=0.125 2023-11-25 23:34:04,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3115286.6666666665, ans=0.125 2023-11-25 23:34:06,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3115353.3333333335, ans=0.125 2023-11-25 23:34:06,989 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10400, loss[loss=0.07539, simple_loss=0.09589, pruned_loss=0.01762, audio_tagging_loss=0.009822, over 14658.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09066, pruned_loss=0.01276, audio_tagging_loss=0.009243, over 3054954.51 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:34:10,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-25 23:34:14,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3115353.3333333335, ans=0.125 2023-11-25 23:34:20,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3115420.0, ans=0.0 2023-11-25 23:34:38,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3115486.6666666665, ans=0.05 2023-11-25 23:34:47,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3115553.3333333335, ans=0.05 2023-11-25 23:34:53,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3115620.0, ans=0.0 2023-11-25 23:34:56,624 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467350 2023-11-25 23:34:58,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=12.0 2023-11-25 23:34:58,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3115620.0, ans=0.2 2023-11-25 23:35:01,778 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10450, loss[loss=0.06564, simple_loss=0.09284, pruned_loss=0.01148, audio_tagging_loss=0.00774, over 15202.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09015, pruned_loss=0.01258, audio_tagging_loss=0.009204, over 3055451.72 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:35:28,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-25 23:35:43,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3115886.6666666665, ans=0.0 2023-11-25 23:35:49,277 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.667e+01 9.396e+01 1.018e+02 1.785e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 23:35:50,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3115953.3333333335, ans=0.95 2023-11-25 23:35:51,409 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467400 2023-11-25 23:35:56,808 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10500, loss[loss=0.05624, simple_loss=0.068, pruned_loss=0.01225, audio_tagging_loss=0.009986, over 14143.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09, pruned_loss=0.01265, audio_tagging_loss=0.009105, over 3053236.81 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:36:16,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3116086.6666666665, ans=10.0 2023-11-25 23:36:18,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3116153.3333333335, ans=0.05 2023-11-25 23:36:19,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-11-25 23:36:32,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3116220.0, ans=0.2 2023-11-25 23:36:38,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3116220.0, ans=0.2 2023-11-25 23:36:44,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3116286.6666666665, ans=0.1 2023-11-25 23:36:46,902 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467450 2023-11-25 23:36:47,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.25 vs. limit=22.5 2023-11-25 23:36:52,573 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10550, loss[loss=0.0721, simple_loss=0.1019, pruned_loss=0.01378, audio_tagging_loss=0.007374, over 15839.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09028, pruned_loss=0.01258, audio_tagging_loss=0.008945, over 3048184.68 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:37:01,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3116353.3333333335, ans=0.125 2023-11-25 23:37:03,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3116420.0, ans=0.1 2023-11-25 23:37:07,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3116420.0, ans=0.125 2023-11-25 23:37:08,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3116420.0, ans=0.125 2023-11-25 23:37:16,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3116486.6666666665, ans=0.125 2023-11-25 23:37:40,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.690e+01 9.247e+01 9.972e+01 1.800e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 23:37:41,717 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467500 2023-11-25 23:37:41,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3116620.0, ans=0.0 2023-11-25 23:37:43,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3116620.0, ans=10.0 2023-11-25 23:37:45,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3116620.0, ans=0.04949747468305833 2023-11-25 23:37:46,811 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10600, loss[loss=0.07789, simple_loss=0.107, pruned_loss=0.01663, audio_tagging_loss=0.007786, over 15258.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09012, pruned_loss=0.01253, audio_tagging_loss=0.008814, over 3045751.62 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:37:56,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-25 23:38:11,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3116820.0, ans=0.09899494936611666 2023-11-25 23:38:21,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3116886.6666666665, ans=0.125 2023-11-25 23:38:29,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3116953.3333333335, ans=0.125 2023-11-25 23:38:35,966 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467550 2023-11-25 23:38:39,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3116953.3333333335, ans=0.0 2023-11-25 23:38:41,681 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10650, loss[loss=0.08188, simple_loss=0.1049, pruned_loss=0.0193, audio_tagging_loss=0.01014, over 15449.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09099, pruned_loss=0.01265, audio_tagging_loss=0.0088, over 3050799.56 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:38:48,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3117020.0, ans=0.2 2023-11-25 23:38:51,242 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:38:51,298 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:38:52,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3117086.6666666665, ans=0.0 2023-11-25 23:38:52,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-25 23:38:54,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3117086.6666666665, ans=0.0 2023-11-25 23:39:05,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3117153.3333333335, ans=0.125 2023-11-25 23:39:11,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3117153.3333333335, ans=0.0 2023-11-25 23:39:15,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=22.5 2023-11-25 23:39:17,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3117220.0, ans=0.125 2023-11-25 23:39:29,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3117286.6666666665, ans=0.125 2023-11-25 23:39:30,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.807e+01 9.255e+01 1.012e+02 1.355e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-25 23:39:31,444 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467600 2023-11-25 23:39:31,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3117286.6666666665, ans=0.0 2023-11-25 23:39:36,811 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10700, loss[loss=0.09109, simple_loss=0.1206, pruned_loss=0.02402, audio_tagging_loss=0.006798, over 15478.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.0909, pruned_loss=0.01261, audio_tagging_loss=0.008793, over 3046625.30 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:39:44,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3117353.3333333335, ans=0.125 2023-11-25 23:39:46,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3117420.0, ans=0.0 2023-11-25 23:39:48,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3117420.0, ans=0.0 2023-11-25 23:39:55,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3117420.0, ans=0.0 2023-11-25 23:40:26,110 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467650 2023-11-25 23:40:31,270 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10750, loss[loss=0.05492, simple_loss=0.07365, pruned_loss=0.00897, audio_tagging_loss=0.00912, over 14443.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08919, pruned_loss=0.0123, audio_tagging_loss=0.008841, over 3035575.99 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:40:38,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2023-11-25 23:40:58,247 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2023-11-25 23:40:59,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3117820.0, ans=0.025 2023-11-25 23:41:07,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3117886.6666666665, ans=0.05 2023-11-25 23:41:08,279 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2023-11-25 23:41:08,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3117886.6666666665, ans=0.2 2023-11-25 23:41:19,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.994e+01 8.803e+01 9.280e+01 9.939e+01 1.365e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 23:41:20,288 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467700 2023-11-25 23:41:25,502 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10800, loss[loss=0.0559, simple_loss=0.0751, pruned_loss=0.009262, audio_tagging_loss=0.009093, over 14537.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08968, pruned_loss=0.01236, audio_tagging_loss=0.008803, over 3035056.99 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:42:13,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2023-11-25 23:42:15,801 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467750 2023-11-25 23:42:20,982 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10850, loss[loss=0.05656, simple_loss=0.07893, pruned_loss=0.006807, audio_tagging_loss=0.01029, over 14810.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08928, pruned_loss=0.01228, audio_tagging_loss=0.008908, over 3043269.16 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:42:33,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-25 23:42:41,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118486.6666666665, ans=0.1 2023-11-25 23:42:42,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3118486.6666666665, ans=0.025 2023-11-25 23:42:43,627 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:42:46,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3118486.6666666665, ans=0.125 2023-11-25 23:43:03,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-25 23:43:09,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.755e+01 9.381e+01 1.019e+02 1.994e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-25 23:43:09,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467800 2023-11-25 23:43:14,300 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:43:15,316 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10900, loss[loss=0.07043, simple_loss=0.09402, pruned_loss=0.01591, audio_tagging_loss=0.007513, over 14589.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08972, pruned_loss=0.01239, audio_tagging_loss=0.008935, over 3042297.76 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:43:18,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3118686.6666666665, ans=0.125 2023-11-25 23:43:25,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3118753.3333333335, ans=0.125 2023-11-25 23:43:29,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3118753.3333333335, ans=0.0 2023-11-25 23:43:42,363 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-25 23:44:04,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467850 2023-11-25 23:44:08,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3119020.0, ans=10.0 2023-11-25 23:44:09,340 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10950, loss[loss=0.08245, simple_loss=0.1177, pruned_loss=0.01595, audio_tagging_loss=0.007663, over 16358.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08974, pruned_loss=0.01248, audio_tagging_loss=0.008959, over 3047325.65 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:44:16,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3119020.0, ans=0.0 2023-11-25 23:44:26,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3119086.6666666665, ans=0.0 2023-11-25 23:44:53,230 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:44:58,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.372e+01 9.128e+01 9.666e+01 1.249e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-25 23:44:58,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467900 2023-11-25 23:45:03,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3119353.3333333335, ans=0.0 2023-11-25 23:45:04,510 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11000, loss[loss=0.07806, simple_loss=0.1166, pruned_loss=0.01205, audio_tagging_loss=0.007691, over 15522.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08926, pruned_loss=0.01232, audio_tagging_loss=0.009028, over 3052615.76 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:45:14,239 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2023-11-25 23:45:15,995 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:45:17,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3119420.0, ans=0.0 2023-11-25 23:45:20,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3119420.0, ans=0.125 2023-11-25 23:45:21,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3119420.0, ans=0.0 2023-11-25 23:45:29,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3119486.6666666665, ans=0.1 2023-11-25 23:45:34,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3119486.6666666665, ans=0.125 2023-11-25 23:45:37,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3119553.3333333335, ans=0.2 2023-11-25 23:45:43,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3119553.3333333335, ans=0.0 2023-11-25 23:45:52,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3119620.0, ans=0.125 2023-11-25 23:45:54,393 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467950 2023-11-25 23:45:59,567 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11050, loss[loss=0.07234, simple_loss=0.1041, pruned_loss=0.01375, audio_tagging_loss=0.00653, over 16560.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08982, pruned_loss=0.01241, audio_tagging_loss=0.009074, over 3046453.02 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:46:02,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3119686.6666666665, ans=0.2 2023-11-25 23:46:20,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3119820.0, ans=0.1 2023-11-25 23:46:24,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3119820.0, ans=0.1 2023-11-25 23:46:48,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.692e+01 9.297e+01 1.029e+02 1.368e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 23:46:48,443 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468000 2023-11-25 23:46:55,498 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11100, loss[loss=0.07709, simple_loss=0.1021, pruned_loss=0.01601, audio_tagging_loss=0.01006, over 16159.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.0903, pruned_loss=0.0126, audio_tagging_loss=0.009168, over 3049814.70 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:47:10,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3120086.6666666665, ans=0.125 2023-11-25 23:47:15,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3120086.6666666665, ans=0.125 2023-11-25 23:47:19,489 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:47:25,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3120153.3333333335, ans=0.125 2023-11-25 23:47:27,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3120220.0, ans=0.125 2023-11-25 23:47:35,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-25 23:47:44,323 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468050 2023-11-25 23:47:50,025 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11150, loss[loss=0.04736, simple_loss=0.05534, pruned_loss=0.006477, audio_tagging_loss=0.01321, over 13587.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09076, pruned_loss=0.01286, audio_tagging_loss=0.009198, over 3046985.74 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:48:01,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3120420.0, ans=0.0 2023-11-25 23:48:20,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2023-11-25 23:48:22,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3120553.3333333335, ans=0.0 2023-11-25 23:48:23,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2023-11-25 23:48:32,280 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:48:33,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3120620.0, ans=0.0 2023-11-25 23:48:38,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.667e+01 9.262e+01 9.903e+01 1.395e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-25 23:48:38,281 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468100 2023-11-25 23:48:43,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-25 23:48:43,929 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11200, loss[loss=0.05933, simple_loss=0.08592, pruned_loss=0.008547, audio_tagging_loss=0.007821, over 16044.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09128, pruned_loss=0.01293, audio_tagging_loss=0.009343, over 3045252.67 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 32.0 2023-11-25 23:48:45,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3120686.6666666665, ans=0.1 2023-11-25 23:48:53,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3120753.3333333335, ans=0.0 2023-11-25 23:48:53,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3120753.3333333335, ans=0.2 2023-11-25 23:49:24,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3120886.6666666665, ans=0.125 2023-11-25 23:49:31,653 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468150 2023-11-25 23:49:36,760 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11250, loss[loss=0.07815, simple_loss=0.1125, pruned_loss=0.01283, audio_tagging_loss=0.00907, over 14827.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09049, pruned_loss=0.01279, audio_tagging_loss=0.009359, over 3040287.23 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:50:04,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-11-25 23:50:17,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3121220.0, ans=0.0 2023-11-25 23:50:21,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3121286.6666666665, ans=0.125 2023-11-25 23:50:25,541 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468200 2023-11-25 23:50:26,484 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.668e+01 9.346e+01 1.011e+02 2.547e+02, threshold=1.869e+02, percent-clipped=1.0 2023-11-25 23:50:31,483 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11300, loss[loss=0.06458, simple_loss=0.08277, pruned_loss=0.01383, audio_tagging_loss=0.00937, over 14962.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09097, pruned_loss=0.01281, audio_tagging_loss=0.009159, over 3040469.52 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:50:35,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3121353.3333333335, ans=0.1 2023-11-25 23:50:38,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3121353.3333333335, ans=0.2 2023-11-25 23:50:45,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3121420.0, ans=0.1 2023-11-25 23:51:05,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3121553.3333333335, ans=0.1 2023-11-25 23:51:20,358 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468250 2023-11-25 23:51:25,989 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11350, loss[loss=0.06227, simple_loss=0.08021, pruned_loss=0.01359, audio_tagging_loss=0.008571, over 14567.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09033, pruned_loss=0.01279, audio_tagging_loss=0.009085, over 3046716.06 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:51:43,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3121753.3333333335, ans=0.125 2023-11-25 23:51:57,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-25 23:51:57,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2023-11-25 23:52:12,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3121953.3333333335, ans=0.1 2023-11-25 23:52:15,190 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468300 2023-11-25 23:52:16,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.715e+01 9.313e+01 1.012e+02 1.423e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-25 23:52:20,329 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11400, loss[loss=0.07479, simple_loss=0.1045, pruned_loss=0.01605, audio_tagging_loss=0.006483, over 15271.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09137, pruned_loss=0.01296, audio_tagging_loss=0.00895, over 3045020.04 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:52:23,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-25 23:52:39,211 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:52:50,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3122153.3333333335, ans=0.035 2023-11-25 23:52:52,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3122220.0, ans=0.125 2023-11-25 23:52:54,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3122220.0, ans=0.125 2023-11-25 23:52:58,759 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:53:09,031 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468350 2023-11-25 23:53:14,124 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11450, loss[loss=0.0622, simple_loss=0.08407, pruned_loss=0.01111, audio_tagging_loss=0.009055, over 15462.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09061, pruned_loss=0.0128, audio_tagging_loss=0.0089, over 3048224.62 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:53:26,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3122420.0, ans=0.0 2023-11-25 23:54:03,222 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468400 2023-11-25 23:54:04,159 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.283e+01 9.283e+01 1.005e+02 1.593e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-25 23:54:09,208 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11500, loss[loss=0.06435, simple_loss=0.08656, pruned_loss=0.01155, audio_tagging_loss=0.009522, over 14337.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09053, pruned_loss=0.01292, audio_tagging_loss=0.00889, over 3039044.67 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:54:09,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.77 vs. limit=10.0 2023-11-25 23:54:20,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3122753.3333333335, ans=0.0 2023-11-25 23:54:24,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3122753.3333333335, ans=0.0 2023-11-25 23:54:36,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3122820.0, ans=0.125 2023-11-25 23:54:41,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3122886.6666666665, ans=0.04949747468305833 2023-11-25 23:54:57,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468450 2023-11-25 23:55:00,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2023-11-25 23:55:01,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3122953.3333333335, ans=0.0 2023-11-25 23:55:03,506 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11550, loss[loss=0.0799, simple_loss=0.1092, pruned_loss=0.01398, audio_tagging_loss=0.0113, over 15517.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09094, pruned_loss=0.0129, audio_tagging_loss=0.008981, over 3044521.25 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:55:07,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3123020.0, ans=0.125 2023-11-25 23:55:07,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3123020.0, ans=0.125 2023-11-25 23:55:19,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2023-11-25 23:55:27,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3123153.3333333335, ans=0.95 2023-11-25 23:55:31,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3123153.3333333335, ans=0.125 2023-11-25 23:55:38,138 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:55:40,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3123220.0, ans=0.125 2023-11-25 23:55:52,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468500 2023-11-25 23:55:53,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.807e+01 9.353e+01 9.870e+01 1.294e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 23:55:53,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3123286.6666666665, ans=0.025 2023-11-25 23:55:57,354 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11600, loss[loss=0.09066, simple_loss=0.125, pruned_loss=0.01969, audio_tagging_loss=0.008479, over 15422.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09125, pruned_loss=0.01282, audio_tagging_loss=0.009027, over 3046864.24 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 32.0 2023-11-25 23:55:58,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3123353.3333333335, ans=0.125 2023-11-25 23:55:59,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3123353.3333333335, ans=0.015 2023-11-25 23:56:18,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3123486.6666666665, ans=0.0 2023-11-25 23:56:40,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3123620.0, ans=0.2 2023-11-25 23:56:47,260 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468550 2023-11-25 23:56:52,391 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11650, loss[loss=0.05221, simple_loss=0.06342, pruned_loss=0.01047, audio_tagging_loss=0.01003, over 14225.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.0905, pruned_loss=0.01277, audio_tagging_loss=0.009063, over 3041339.09 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:57:07,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3123753.3333333335, ans=0.0 2023-11-25 23:57:20,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=22.5 2023-11-25 23:57:22,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-25 23:57:29,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3123886.6666666665, ans=0.025 2023-11-25 23:57:37,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3123953.3333333335, ans=0.0 2023-11-25 23:57:41,962 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468600 2023-11-25 23:57:44,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.612e+01 9.119e+01 9.760e+01 1.208e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-25 23:57:47,433 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11700, loss[loss=0.05336, simple_loss=0.06644, pruned_loss=0.0107, audio_tagging_loss=0.009444, over 15005.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09008, pruned_loss=0.01288, audio_tagging_loss=0.009118, over 3044279.85 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:58:08,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2023-11-25 23:58:15,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3124153.3333333335, ans=0.125 2023-11-25 23:58:16,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3124153.3333333335, ans=0.1 2023-11-25 23:58:19,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2023-11-25 23:58:36,902 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468650 2023-11-25 23:58:42,048 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11750, loss[loss=0.08363, simple_loss=0.09846, pruned_loss=0.02059, audio_tagging_loss=0.01381, over 16018.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09002, pruned_loss=0.01289, audio_tagging_loss=0.009156, over 3046970.20 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:58:43,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2023-11-25 23:58:52,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3124420.0, ans=0.2 2023-11-25 23:58:57,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3124420.0, ans=0.125 2023-11-25 23:59:21,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3124553.3333333335, ans=0.0 2023-11-25 23:59:32,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468700 2023-11-25 23:59:34,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.692e+01 9.354e+01 9.925e+01 1.548e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 23:59:37,271 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11800, loss[loss=0.08195, simple_loss=0.1158, pruned_loss=0.01848, audio_tagging_loss=0.005589, over 15673.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08934, pruned_loss=0.01277, audio_tagging_loss=0.009251, over 3039597.06 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:59:42,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3124686.6666666665, ans=0.0 2023-11-25 23:59:42,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3124686.6666666665, ans=0.05 2023-11-25 23:59:53,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2023-11-25 23:59:55,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3124753.3333333335, ans=0.0 2023-11-26 00:00:08,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3124886.6666666665, ans=0.0 2023-11-26 00:00:24,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3124953.3333333335, ans=0.2 2023-11-26 00:00:26,143 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468750 2023-11-26 00:00:31,225 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11850, loss[loss=0.07347, simple_loss=0.1022, pruned_loss=0.01428, audio_tagging_loss=0.008115, over 15616.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.08984, pruned_loss=0.01293, audio_tagging_loss=0.009263, over 3047893.91 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:00:31,473 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:00:40,095 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.41 vs. limit=10.0 2023-11-26 00:00:48,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3125086.6666666665, ans=0.0 2023-11-26 00:01:00,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3125153.3333333335, ans=0.125 2023-11-26 00:01:00,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3125153.3333333335, ans=0.07 2023-11-26 00:01:20,114 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468800 2023-11-26 00:01:22,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.741e+01 9.224e+01 1.012e+02 1.182e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 00:01:25,618 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11900, loss[loss=0.06433, simple_loss=0.08994, pruned_loss=0.01194, audio_tagging_loss=0.007416, over 15700.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.0894, pruned_loss=0.0127, audio_tagging_loss=0.009263, over 3052483.42 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:02:08,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3125620.0, ans=0.0 2023-11-26 00:02:15,005 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468850 2023-11-26 00:02:16,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3125620.0, ans=0.125 2023-11-26 00:02:18,245 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2023-11-26 00:02:20,694 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11950, loss[loss=0.05691, simple_loss=0.06776, pruned_loss=0.0136, audio_tagging_loss=0.009428, over 14375.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08952, pruned_loss=0.01261, audio_tagging_loss=0.009279, over 3047784.37 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:02:27,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3125686.6666666665, ans=0.0 2023-11-26 00:02:27,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2023-11-26 00:02:28,108 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2023-11-26 00:03:09,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468900 2023-11-26 00:03:11,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.661e+01 9.250e+01 9.933e+01 1.391e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 00:03:14,635 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 12000, loss[loss=0.07282, simple_loss=0.09425, pruned_loss=0.01273, audio_tagging_loss=0.01297, over 14464.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08892, pruned_loss=0.01249, audio_tagging_loss=0.009404, over 3038153.33 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2023-11-26 00:03:14,636 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 00:03:29,944 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7734, 4.8939, 4.9220, 4.8244], device='cuda:2') 2023-11-26 00:03:37,279 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2289, 2.9538, 3.2976, 3.0245, 3.7326, 3.7640, 3.3197, 3.2493], device='cuda:2') 2023-11-26 00:03:47,127 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.05809, simple_loss=0.05065, pruned_loss=0.005132, audio_tagging_loss=0.02764, over 4681554.00 frames. 2023-11-26 00:03:47,128 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 00:03:52,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3126020.0, ans=0.125 2023-11-26 00:04:10,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126153.3333333335, ans=0.1 2023-11-26 00:04:40,565 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 0, loss[loss=0.08651, simple_loss=0.1066, pruned_loss=0.01305, audio_tagging_loss=0.02017, over 14542.00 frames. ], tot_loss[loss=0.08651, simple_loss=0.1066, pruned_loss=0.01305, audio_tagging_loss=0.02017, over 14542.00 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:04:40,565 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 00:05:12,154 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.05782, simple_loss=0.05064, pruned_loss=0.005121, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-26 00:05:12,155 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 00:05:19,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3126186.6666666665, ans=0.04949747468305833 2023-11-26 00:05:34,113 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468950 2023-11-26 00:05:39,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-26 00:05:40,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3126320.0, ans=0.125 2023-11-26 00:05:56,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=22.5 2023-11-26 00:05:58,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3126453.3333333335, ans=0.125 2023-11-26 00:06:06,233 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:06:07,115 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 50, loss[loss=0.06407, simple_loss=0.07989, pruned_loss=0.00856, audio_tagging_loss=0.01556, over 14922.00 frames. ], tot_loss[loss=0.07369, simple_loss=0.08812, pruned_loss=0.01203, audio_tagging_loss=0.01759, over 681307.40 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:06:22,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3126586.6666666665, ans=0.125 2023-11-26 00:06:28,964 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469000 2023-11-26 00:06:29,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2023-11-26 00:06:32,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 9.207e+01 9.971e+01 1.067e+02 1.313e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-26 00:06:56,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3126786.6666666665, ans=0.05 2023-11-26 00:07:01,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3126853.3333333335, ans=0.5 2023-11-26 00:07:02,694 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 100, loss[loss=0.07511, simple_loss=0.09609, pruned_loss=0.008709, audio_tagging_loss=0.01836, over 15838.00 frames. ], tot_loss[loss=0.07388, simple_loss=0.0891, pruned_loss=0.01227, audio_tagging_loss=0.01706, over 1204595.71 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:07:03,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3126853.3333333335, ans=0.0 2023-11-26 00:07:10,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3126853.3333333335, ans=0.1 2023-11-26 00:07:14,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3126920.0, ans=0.125 2023-11-26 00:07:25,701 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469050 2023-11-26 00:07:35,087 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2023-11-26 00:07:35,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-26 00:07:40,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3127053.3333333335, ans=0.0 2023-11-26 00:07:45,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-11-26 00:07:48,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3127120.0, ans=10.0 2023-11-26 00:07:58,595 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 150, loss[loss=0.07432, simple_loss=0.1011, pruned_loss=0.01377, audio_tagging_loss=0.01, over 15031.00 frames. ], tot_loss[loss=0.07144, simple_loss=0.08851, pruned_loss=0.01209, audio_tagging_loss=0.0151, over 1614222.96 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:08:15,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3127253.3333333335, ans=0.125 2023-11-26 00:08:15,522 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.30 vs. limit=10.0 2023-11-26 00:08:18,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3127253.3333333335, ans=0.2 2023-11-26 00:08:21,751 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469100 2023-11-26 00:08:24,930 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.020e+01 9.615e+01 1.041e+02 1.301e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 00:08:54,981 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 200, loss[loss=0.0815, simple_loss=0.1113, pruned_loss=0.02035, audio_tagging_loss=0.005495, over 15472.00 frames. ], tot_loss[loss=0.0707, simple_loss=0.08984, pruned_loss=0.01246, audio_tagging_loss=0.01331, over 1934898.47 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:08:55,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3127520.0, ans=0.05 2023-11-26 00:09:04,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3127520.0, ans=0.125 2023-11-26 00:09:07,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3127586.6666666665, ans=0.125 2023-11-26 00:09:16,955 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469150 2023-11-26 00:09:28,150 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:09:37,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3127720.0, ans=0.1 2023-11-26 00:09:37,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3127720.0, ans=0.2 2023-11-26 00:09:45,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3127786.6666666665, ans=0.1 2023-11-26 00:09:50,287 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 250, loss[loss=0.07399, simple_loss=0.1027, pruned_loss=0.01478, audio_tagging_loss=0.00787, over 14111.00 frames. ], tot_loss[loss=0.0696, simple_loss=0.09019, pruned_loss=0.01259, audio_tagging_loss=0.01192, over 2187241.75 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:09:55,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-11-26 00:10:05,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3127920.0, ans=0.0 2023-11-26 00:10:12,776 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469200 2023-11-26 00:10:12,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3127986.6666666665, ans=0.1 2023-11-26 00:10:17,834 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.769e+01 9.325e+01 1.022e+02 1.435e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 00:10:31,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-26 00:10:35,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3128120.0, ans=0.0 2023-11-26 00:10:45,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3128186.6666666665, ans=0.0 2023-11-26 00:10:46,176 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 300, loss[loss=0.06025, simple_loss=0.08096, pruned_loss=0.01069, audio_tagging_loss=0.009085, over 15370.00 frames. ], tot_loss[loss=0.06901, simple_loss=0.09098, pruned_loss=0.01252, audio_tagging_loss=0.011, over 2380070.86 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:11:09,537 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469250 2023-11-26 00:11:16,261 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2023-11-26 00:11:36,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3128453.3333333335, ans=0.125 2023-11-26 00:11:42,964 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 350, loss[loss=0.03797, simple_loss=0.04714, pruned_loss=0.005573, audio_tagging_loss=0.008826, over 15146.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.08937, pruned_loss=0.01225, audio_tagging_loss=0.01054, over 2530889.46 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:11:47,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3128520.0, ans=0.0 2023-11-26 00:11:54,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-26 00:11:55,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3128586.6666666665, ans=0.125 2023-11-26 00:12:03,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3128653.3333333335, ans=0.0 2023-11-26 00:12:04,823 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469300 2023-11-26 00:12:08,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.708e+01 9.325e+01 9.980e+01 1.485e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 00:12:16,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3128720.0, ans=0.0 2023-11-26 00:12:16,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-26 00:12:17,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3128720.0, ans=0.0 2023-11-26 00:12:19,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3128720.0, ans=0.125 2023-11-26 00:12:24,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=15.0 2023-11-26 00:12:38,374 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 400, loss[loss=0.05996, simple_loss=0.0804, pruned_loss=0.008592, audio_tagging_loss=0.01117, over 15335.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.08939, pruned_loss=0.01231, audio_tagging_loss=0.01017, over 2642936.20 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:12:38,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3128853.3333333335, ans=0.125 2023-11-26 00:13:00,056 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469350 2023-11-26 00:13:00,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2023-11-26 00:13:32,805 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 450, loss[loss=0.07723, simple_loss=0.1076, pruned_loss=0.01635, audio_tagging_loss=0.007109, over 15533.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.08988, pruned_loss=0.01237, audio_tagging_loss=0.009827, over 2734544.04 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:13:36,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3129186.6666666665, ans=10.0 2023-11-26 00:13:37,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3129186.6666666665, ans=0.0 2023-11-26 00:13:42,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3129186.6666666665, ans=0.1 2023-11-26 00:13:48,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3129253.3333333335, ans=0.0 2023-11-26 00:13:56,328 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469400 2023-11-26 00:14:00,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.743e+01 9.299e+01 9.864e+01 1.390e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 00:14:04,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3129320.0, ans=0.125 2023-11-26 00:14:11,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3129386.6666666665, ans=0.07 2023-11-26 00:14:14,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3129386.6666666665, ans=0.125 2023-11-26 00:14:26,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3129453.3333333335, ans=0.025 2023-11-26 00:14:27,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3129453.3333333335, ans=0.125 2023-11-26 00:14:28,994 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 500, loss[loss=0.09074, simple_loss=0.1312, pruned_loss=0.01846, audio_tagging_loss=0.006682, over 15800.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.08932, pruned_loss=0.0124, audio_tagging_loss=0.009651, over 2795611.47 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:14:37,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3129520.0, ans=0.2 2023-11-26 00:14:47,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3129586.6666666665, ans=0.025 2023-11-26 00:14:51,404 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469450 2023-11-26 00:15:03,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3129720.0, ans=0.125 2023-11-26 00:15:14,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3129786.6666666665, ans=0.1 2023-11-26 00:15:24,640 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 550, loss[loss=0.0671, simple_loss=0.09515, pruned_loss=0.01182, audio_tagging_loss=0.00771, over 15218.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09009, pruned_loss=0.01246, audio_tagging_loss=0.009525, over 2852268.60 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:15:30,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-11-26 00:15:34,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3129920.0, ans=0.125 2023-11-26 00:15:37,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-26 00:15:40,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3129920.0, ans=0.2 2023-11-26 00:15:46,709 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469500 2023-11-26 00:15:46,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3129986.6666666665, ans=0.2 2023-11-26 00:15:49,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3129986.6666666665, ans=0.125 2023-11-26 00:15:50,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.611e+01 9.176e+01 9.917e+01 4.186e+02, threshold=1.835e+02, percent-clipped=1.0 2023-11-26 00:16:03,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-26 00:16:03,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-26 00:16:07,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-26 00:16:07,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=22.5 2023-11-26 00:16:19,921 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 600, loss[loss=0.08171, simple_loss=0.1139, pruned_loss=0.01493, audio_tagging_loss=0.009851, over 15837.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09018, pruned_loss=0.01238, audio_tagging_loss=0.009448, over 2896356.48 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:16:29,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3130186.6666666665, ans=0.125 2023-11-26 00:16:43,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469550 2023-11-26 00:16:49,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3130320.0, ans=0.125 2023-11-26 00:16:51,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3130320.0, ans=0.0 2023-11-26 00:16:55,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3130386.6666666665, ans=0.2 2023-11-26 00:16:56,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3130386.6666666665, ans=0.0 2023-11-26 00:17:02,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3130386.6666666665, ans=0.0 2023-11-26 00:17:04,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3130453.3333333335, ans=0.125 2023-11-26 00:17:10,525 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-26 00:17:16,592 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 650, loss[loss=0.07832, simple_loss=0.1095, pruned_loss=0.01491, audio_tagging_loss=0.008641, over 16102.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08983, pruned_loss=0.01239, audio_tagging_loss=0.009309, over 2928045.58 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:17:23,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3130520.0, ans=0.1 2023-11-26 00:17:29,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3130586.6666666665, ans=0.0 2023-11-26 00:17:32,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3130586.6666666665, ans=0.2 2023-11-26 00:17:35,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2023-11-26 00:17:39,113 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469600 2023-11-26 00:17:43,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.552e+01 9.119e+01 9.990e+01 1.151e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 00:18:12,547 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 700, loss[loss=0.06998, simple_loss=0.1001, pruned_loss=0.01316, audio_tagging_loss=0.006777, over 15521.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08972, pruned_loss=0.01242, audio_tagging_loss=0.009231, over 2955230.09 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:18:22,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3130853.3333333335, ans=22.5 2023-11-26 00:18:34,308 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469650 2023-11-26 00:19:00,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3131120.0, ans=0.125 2023-11-26 00:19:07,763 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 750, loss[loss=0.07361, simple_loss=0.1083, pruned_loss=0.01197, audio_tagging_loss=0.007499, over 15527.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.0912, pruned_loss=0.01262, audio_tagging_loss=0.009069, over 2977998.63 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:19:12,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3131186.6666666665, ans=0.125 2023-11-26 00:19:22,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3131253.3333333335, ans=0.2 2023-11-26 00:19:25,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3131253.3333333335, ans=0.125 2023-11-26 00:19:29,612 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469700 2023-11-26 00:19:34,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.495e+01 9.390e+01 9.960e+01 1.200e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 00:19:57,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3131453.3333333335, ans=0.1 2023-11-26 00:20:03,197 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 800, loss[loss=0.06337, simple_loss=0.09407, pruned_loss=0.00986, audio_tagging_loss=0.006471, over 16804.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.09208, pruned_loss=0.01286, audio_tagging_loss=0.00898, over 2995662.61 frames. ], batch size: 63, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:20:04,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.30 vs. limit=22.5 2023-11-26 00:20:18,780 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:20:22,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3131586.6666666665, ans=0.0 2023-11-26 00:20:25,577 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469750 2023-11-26 00:20:40,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-26 00:20:52,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3131786.6666666665, ans=0.125 2023-11-26 00:20:54,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3131786.6666666665, ans=0.0 2023-11-26 00:20:59,449 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 850, loss[loss=0.106, simple_loss=0.1562, pruned_loss=0.02002, audio_tagging_loss=0.007889, over 15606.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09212, pruned_loss=0.01289, audio_tagging_loss=0.009051, over 3002394.43 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:21:16,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3131920.0, ans=0.0 2023-11-26 00:21:21,199 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469800 2023-11-26 00:21:26,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.683e+01 9.047e+01 1.001e+02 1.303e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-26 00:21:34,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3132053.3333333335, ans=0.1 2023-11-26 00:21:34,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-26 00:21:35,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3132053.3333333335, ans=0.0 2023-11-26 00:21:42,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3132053.3333333335, ans=0.025 2023-11-26 00:21:49,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3132120.0, ans=0.1 2023-11-26 00:21:55,550 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 900, loss[loss=0.06045, simple_loss=0.09301, pruned_loss=0.008162, audio_tagging_loss=0.005782, over 14330.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09185, pruned_loss=0.01278, audio_tagging_loss=0.009147, over 3013516.46 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:22:10,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3132253.3333333335, ans=0.0 2023-11-26 00:22:18,273 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469850 2023-11-26 00:22:22,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3132320.0, ans=0.125 2023-11-26 00:22:32,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3132386.6666666665, ans=0.1 2023-11-26 00:22:52,230 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 950, loss[loss=0.08113, simple_loss=0.1125, pruned_loss=0.01669, audio_tagging_loss=0.008208, over 14710.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09188, pruned_loss=0.01285, audio_tagging_loss=0.009146, over 3022661.83 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:23:08,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3132586.6666666665, ans=0.0 2023-11-26 00:23:14,129 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469900 2023-11-26 00:23:19,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.462e+01 9.352e+01 1.021e+02 1.286e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 00:23:29,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2023-11-26 00:23:34,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3132720.0, ans=0.0 2023-11-26 00:23:47,645 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1000, loss[loss=0.05587, simple_loss=0.08085, pruned_loss=0.00918, audio_tagging_loss=0.00627, over 14529.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09165, pruned_loss=0.0128, audio_tagging_loss=0.008965, over 3030155.38 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:23:51,742 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-11-26 00:24:02,346 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:24:08,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2023-11-26 00:24:10,094 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469950 2023-11-26 00:24:12,218 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:24:13,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3132986.6666666665, ans=0.2 2023-11-26 00:24:23,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3133053.3333333335, ans=0.125 2023-11-26 00:24:35,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3133120.0, ans=15.0 2023-11-26 00:24:43,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3133186.6666666665, ans=0.125 2023-11-26 00:24:43,925 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1050, loss[loss=0.0769, simple_loss=0.1029, pruned_loss=0.0188, audio_tagging_loss=0.006671, over 14478.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09199, pruned_loss=0.01293, audio_tagging_loss=0.008859, over 3042288.23 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:25:01,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3133253.3333333335, ans=0.1 2023-11-26 00:25:02,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3133253.3333333335, ans=0.0 2023-11-26 00:25:06,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470000 2023-11-26 00:25:09,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3133320.0, ans=0.0 2023-11-26 00:25:12,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.549e+01 9.309e+01 1.004e+02 1.287e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 00:25:40,147 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1100, loss[loss=0.08834, simple_loss=0.127, pruned_loss=0.01757, audio_tagging_loss=0.007263, over 15826.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09124, pruned_loss=0.01292, audio_tagging_loss=0.008926, over 3040119.51 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:25:44,377 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:25:45,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3133520.0, ans=0.125 2023-11-26 00:25:49,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3133520.0, ans=0.125 2023-11-26 00:25:58,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3133586.6666666665, ans=0.0 2023-11-26 00:26:00,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-26 00:26:02,878 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470050 2023-11-26 00:26:08,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3133653.3333333335, ans=0.2 2023-11-26 00:26:29,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3133786.6666666665, ans=0.125 2023-11-26 00:26:36,638 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1150, loss[loss=0.06355, simple_loss=0.08616, pruned_loss=0.008715, audio_tagging_loss=0.01175, over 15230.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.0905, pruned_loss=0.01273, audio_tagging_loss=0.00889, over 3041460.87 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:26:39,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=22.5 2023-11-26 00:26:43,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3133853.3333333335, ans=0.125 2023-11-26 00:26:48,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3133920.0, ans=0.02 2023-11-26 00:26:58,329 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470100 2023-11-26 00:27:04,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.538e+01 9.163e+01 1.008e+02 1.257e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 00:27:05,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3133986.6666666665, ans=0.125 2023-11-26 00:27:06,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3133986.6666666665, ans=0.0 2023-11-26 00:27:13,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3134053.3333333335, ans=0.0 2023-11-26 00:27:15,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=22.5 2023-11-26 00:27:32,281 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1200, loss[loss=0.06329, simple_loss=0.08012, pruned_loss=0.01306, audio_tagging_loss=0.01018, over 13872.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09069, pruned_loss=0.01281, audio_tagging_loss=0.008836, over 3035955.38 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:27:40,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3134186.6666666665, ans=0.125 2023-11-26 00:27:51,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3134253.3333333335, ans=0.125 2023-11-26 00:27:55,243 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470150 2023-11-26 00:28:08,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-26 00:28:16,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3134453.3333333335, ans=0.0 2023-11-26 00:28:27,664 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1250, loss[loss=0.05988, simple_loss=0.07706, pruned_loss=0.01, audio_tagging_loss=0.01135, over 16249.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09019, pruned_loss=0.01272, audio_tagging_loss=0.008831, over 3036375.02 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:28:36,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3134520.0, ans=0.0 2023-11-26 00:28:37,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3134520.0, ans=0.5 2023-11-26 00:28:46,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3134586.6666666665, ans=0.125 2023-11-26 00:28:50,671 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470200 2023-11-26 00:28:50,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-26 00:28:51,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3134653.3333333335, ans=0.1 2023-11-26 00:28:51,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3134653.3333333335, ans=0.05 2023-11-26 00:28:56,136 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.537e+01 9.082e+01 9.508e+01 1.462e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 00:28:56,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3134653.3333333335, ans=10.0 2023-11-26 00:29:03,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-26 00:29:23,795 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1300, loss[loss=0.04538, simple_loss=0.0599, pruned_loss=0.003696, audio_tagging_loss=0.01173, over 14044.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09002, pruned_loss=0.01272, audio_tagging_loss=0.008836, over 3034144.53 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:29:31,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3134853.3333333335, ans=0.1 2023-11-26 00:29:45,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470250 2023-11-26 00:29:53,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3134986.6666666665, ans=0.07 2023-11-26 00:30:11,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135120.0, ans=0.1 2023-11-26 00:30:19,466 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1350, loss[loss=0.05868, simple_loss=0.07513, pruned_loss=0.012, audio_tagging_loss=0.00912, over 14951.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.0906, pruned_loss=0.01279, audio_tagging_loss=0.008876, over 3033546.33 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:30:41,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470300 2023-11-26 00:30:47,640 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.371e+01 9.120e+01 9.741e+01 1.134e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 00:31:00,783 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:31:04,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3135453.3333333335, ans=0.0 2023-11-26 00:31:14,771 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1400, loss[loss=0.07242, simple_loss=0.09815, pruned_loss=0.01341, audio_tagging_loss=0.009941, over 16232.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.0904, pruned_loss=0.01273, audio_tagging_loss=0.009038, over 3042060.06 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:31:16,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135520.0, ans=0.1 2023-11-26 00:31:22,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3135520.0, ans=0.0 2023-11-26 00:31:38,594 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470350 2023-11-26 00:31:57,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3135720.0, ans=0.125 2023-11-26 00:32:05,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3135786.6666666665, ans=0.125 2023-11-26 00:32:08,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-26 00:32:11,754 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1450, loss[loss=0.06735, simple_loss=0.09148, pruned_loss=0.01391, audio_tagging_loss=0.007699, over 15433.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09033, pruned_loss=0.01269, audio_tagging_loss=0.009103, over 3040407.35 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:32:15,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.12 vs. limit=10.0 2023-11-26 00:32:21,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3135853.3333333335, ans=0.125 2023-11-26 00:32:28,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3135920.0, ans=0.0 2023-11-26 00:32:29,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135920.0, ans=0.1 2023-11-26 00:32:33,898 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470400 2023-11-26 00:32:40,435 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.703e+01 9.390e+01 1.022e+02 1.337e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 00:32:52,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3136053.3333333335, ans=0.125 2023-11-26 00:33:04,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3136120.0, ans=0.125 2023-11-26 00:33:06,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3136120.0, ans=0.125 2023-11-26 00:33:06,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3136120.0, ans=0.1 2023-11-26 00:33:08,192 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1500, loss[loss=0.07248, simple_loss=0.0917, pruned_loss=0.01647, audio_tagging_loss=0.01017, over 14507.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.0904, pruned_loss=0.01284, audio_tagging_loss=0.009124, over 3044947.61 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:33:08,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3136186.6666666665, ans=0.0 2023-11-26 00:33:09,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3136186.6666666665, ans=0.125 2023-11-26 00:33:18,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136253.3333333335, ans=0.1 2023-11-26 00:33:30,740 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470450 2023-11-26 00:33:40,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2023-11-26 00:33:40,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-11-26 00:33:42,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3136386.6666666665, ans=0.125 2023-11-26 00:33:54,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-26 00:33:59,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3136453.3333333335, ans=0.2 2023-11-26 00:34:01,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136453.3333333335, ans=0.1 2023-11-26 00:34:03,651 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1550, loss[loss=0.06469, simple_loss=0.07994, pruned_loss=0.01339, audio_tagging_loss=0.01134, over 15502.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09099, pruned_loss=0.01299, audio_tagging_loss=0.009094, over 3053148.93 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:34:26,667 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470500 2023-11-26 00:34:33,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.668e+01 9.304e+01 9.957e+01 1.824e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 00:34:35,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3136653.3333333335, ans=0.2 2023-11-26 00:34:41,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3136720.0, ans=0.1 2023-11-26 00:34:45,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136720.0, ans=0.1 2023-11-26 00:34:57,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3136786.6666666665, ans=0.0 2023-11-26 00:34:59,524 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1600, loss[loss=0.03752, simple_loss=0.05125, pruned_loss=0.00156, audio_tagging_loss=0.01034, over 15967.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.08991, pruned_loss=0.01286, audio_tagging_loss=0.009226, over 3049131.24 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:35:03,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3136853.3333333335, ans=0.09899494936611666 2023-11-26 00:35:22,167 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470550 2023-11-26 00:35:24,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3136986.6666666665, ans=0.125 2023-11-26 00:35:37,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3137053.3333333335, ans=0.125 2023-11-26 00:35:51,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3137120.0, ans=0.1 2023-11-26 00:35:55,997 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1650, loss[loss=0.05116, simple_loss=0.05895, pruned_loss=0.009352, audio_tagging_loss=0.01233, over 14832.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.08992, pruned_loss=0.01263, audio_tagging_loss=0.009285, over 3050523.09 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:36:04,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3137186.6666666665, ans=0.2 2023-11-26 00:36:17,252 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470600 2023-11-26 00:36:24,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.506e+01 9.125e+01 1.020e+02 1.203e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-26 00:36:26,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3137320.0, ans=0.0 2023-11-26 00:36:30,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-11-26 00:36:36,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2023-11-26 00:36:41,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3137453.3333333335, ans=0.125 2023-11-26 00:36:51,027 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1700, loss[loss=0.05639, simple_loss=0.08105, pruned_loss=0.007434, audio_tagging_loss=0.00843, over 15021.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.0894, pruned_loss=0.0125, audio_tagging_loss=0.009303, over 3051887.00 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:36:58,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3137520.0, ans=0.0 2023-11-26 00:37:01,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3137586.6666666665, ans=0.0 2023-11-26 00:37:04,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3137586.6666666665, ans=0.0 2023-11-26 00:37:05,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-11-26 00:37:09,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3137586.6666666665, ans=0.2 2023-11-26 00:37:13,040 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470650 2023-11-26 00:37:18,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3137653.3333333335, ans=0.04949747468305833 2023-11-26 00:37:24,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3137720.0, ans=0.09899494936611666 2023-11-26 00:37:46,332 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1750, loss[loss=0.08145, simple_loss=0.1057, pruned_loss=0.01582, audio_tagging_loss=0.01276, over 15629.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08993, pruned_loss=0.01253, audio_tagging_loss=0.009079, over 3055225.24 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:37:54,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.38 vs. limit=15.0 2023-11-26 00:37:56,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3137920.0, ans=0.125 2023-11-26 00:38:05,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-26 00:38:06,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-26 00:38:08,764 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470700 2023-11-26 00:38:16,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.541e+01 8.977e+01 9.696e+01 1.531e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-26 00:38:16,405 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:38:18,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2023-11-26 00:38:22,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138053.3333333335, ans=0.1 2023-11-26 00:38:31,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3138120.0, ans=0.0 2023-11-26 00:38:42,295 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1800, loss[loss=0.07185, simple_loss=0.097, pruned_loss=0.01354, audio_tagging_loss=0.009812, over 15471.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09029, pruned_loss=0.01253, audio_tagging_loss=0.008923, over 3049906.56 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:38:55,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3138253.3333333335, ans=0.0 2023-11-26 00:39:03,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470750 2023-11-26 00:39:13,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3138320.0, ans=0.0 2023-11-26 00:39:27,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3138453.3333333335, ans=0.1 2023-11-26 00:39:37,365 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1850, loss[loss=0.0713, simple_loss=0.09393, pruned_loss=0.01492, audio_tagging_loss=0.00941, over 15346.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09055, pruned_loss=0.01252, audio_tagging_loss=0.008852, over 3053951.98 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:39:37,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138520.0, ans=0.1 2023-11-26 00:39:41,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3138520.0, ans=0.125 2023-11-26 00:39:59,030 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470800 2023-11-26 00:40:03,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3138653.3333333335, ans=0.125 2023-11-26 00:40:07,148 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.736e+01 9.136e+01 9.723e+01 1.171e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 00:40:12,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3138720.0, ans=0.2 2023-11-26 00:40:32,779 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1900, loss[loss=0.08132, simple_loss=0.11, pruned_loss=0.01657, audio_tagging_loss=0.00975, over 15074.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09197, pruned_loss=0.01267, audio_tagging_loss=0.008819, over 3058054.20 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:40:41,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3138853.3333333335, ans=0.125 2023-11-26 00:40:55,381 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470850 2023-11-26 00:41:16,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.71 vs. limit=22.5 2023-11-26 00:41:28,557 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1950, loss[loss=0.07425, simple_loss=0.1017, pruned_loss=0.01472, audio_tagging_loss=0.008698, over 16041.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09198, pruned_loss=0.01269, audio_tagging_loss=0.008731, over 3058327.55 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:41:50,579 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470900 2023-11-26 00:41:53,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=22.5 2023-11-26 00:41:55,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-26 00:41:59,048 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.427e+01 9.159e+01 1.002e+02 1.233e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 00:42:02,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3139386.6666666665, ans=0.125 2023-11-26 00:42:09,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-26 00:42:24,521 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2000, loss[loss=0.05011, simple_loss=0.06552, pruned_loss=0.009047, audio_tagging_loss=0.008302, over 15568.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09097, pruned_loss=0.01255, audio_tagging_loss=0.008776, over 3053344.24 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:42:43,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2023-11-26 00:42:46,812 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470950 2023-11-26 00:43:11,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3139786.6666666665, ans=0.125 2023-11-26 00:43:19,548 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2050, loss[loss=0.05035, simple_loss=0.07143, pruned_loss=0.007714, audio_tagging_loss=0.006918, over 15948.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.0908, pruned_loss=0.01268, audio_tagging_loss=0.008769, over 3049147.83 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:43:29,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3139920.0, ans=0.0 2023-11-26 00:43:41,857 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471000 2023-11-26 00:43:49,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3139986.6666666665, ans=0.0 2023-11-26 00:43:50,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.583e+01 9.206e+01 9.963e+01 1.276e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 00:43:52,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-26 00:43:54,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-26 00:43:56,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-26 00:44:08,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3140120.0, ans=0.0 2023-11-26 00:44:15,649 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2100, loss[loss=0.08491, simple_loss=0.1179, pruned_loss=0.01757, audio_tagging_loss=0.008376, over 15195.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09083, pruned_loss=0.01259, audio_tagging_loss=0.00875, over 3051210.19 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:44:16,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-26 00:44:22,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3140186.6666666665, ans=0.0 2023-11-26 00:44:23,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3140186.6666666665, ans=0.0 2023-11-26 00:44:38,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471050 2023-11-26 00:45:05,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3140453.3333333335, ans=0.125 2023-11-26 00:45:11,045 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2150, loss[loss=0.08872, simple_loss=0.11, pruned_loss=0.0229, audio_tagging_loss=0.01079, over 15615.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09114, pruned_loss=0.01275, audio_tagging_loss=0.008758, over 3051307.58 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:45:29,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3140586.6666666665, ans=0.2 2023-11-26 00:45:33,536 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471100 2023-11-26 00:45:33,632 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:45:41,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.773e+01 9.255e+01 9.995e+01 1.124e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 00:45:45,891 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:45:49,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3140720.0, ans=0.0 2023-11-26 00:45:59,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3140786.6666666665, ans=0.07 2023-11-26 00:46:06,709 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2200, loss[loss=0.05949, simple_loss=0.07969, pruned_loss=0.01126, audio_tagging_loss=0.008383, over 14639.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09156, pruned_loss=0.01279, audio_tagging_loss=0.008801, over 3044026.11 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:46:08,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3140853.3333333335, ans=0.0 2023-11-26 00:46:23,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3140920.0, ans=0.125 2023-11-26 00:46:29,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471150 2023-11-26 00:46:46,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141053.3333333335, ans=0.1 2023-11-26 00:46:49,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3141120.0, ans=0.125 2023-11-26 00:46:49,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3141120.0, ans=0.0 2023-11-26 00:46:50,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-26 00:46:51,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3141120.0, ans=0.125 2023-11-26 00:47:00,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3141186.6666666665, ans=0.125 2023-11-26 00:47:01,672 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2250, loss[loss=0.06858, simple_loss=0.08993, pruned_loss=0.01305, audio_tagging_loss=0.01056, over 15560.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.0913, pruned_loss=0.0128, audio_tagging_loss=0.008858, over 3044012.14 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:47:03,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3141186.6666666665, ans=0.125 2023-11-26 00:47:18,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3141253.3333333335, ans=0.02 2023-11-26 00:47:23,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471200 2023-11-26 00:47:23,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3141320.0, ans=0.0 2023-11-26 00:47:25,158 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:47:32,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.619e+01 9.398e+01 1.009e+02 1.153e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 00:47:35,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3141386.6666666665, ans=0.2 2023-11-26 00:47:39,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3141386.6666666665, ans=0.125 2023-11-26 00:47:43,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2023-11-26 00:47:45,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3141453.3333333335, ans=0.0 2023-11-26 00:47:48,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3141453.3333333335, ans=0.125 2023-11-26 00:47:52,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3141453.3333333335, ans=0.0 2023-11-26 00:47:53,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3141453.3333333335, ans=0.125 2023-11-26 00:47:57,260 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2300, loss[loss=0.06691, simple_loss=0.08346, pruned_loss=0.01345, audio_tagging_loss=0.01173, over 15659.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09138, pruned_loss=0.01294, audio_tagging_loss=0.008844, over 3046316.36 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:48:19,695 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471250 2023-11-26 00:48:32,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3141720.0, ans=15.0 2023-11-26 00:48:34,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3141720.0, ans=0.0 2023-11-26 00:48:39,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3141720.0, ans=0.2 2023-11-26 00:48:41,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3141786.6666666665, ans=0.1 2023-11-26 00:48:43,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:44,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:46,543 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:48:47,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:52,333 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2350, loss[loss=0.05637, simple_loss=0.07841, pruned_loss=0.008618, audio_tagging_loss=0.008547, over 14429.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09009, pruned_loss=0.01275, audio_tagging_loss=0.009005, over 3038358.23 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:49:11,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3141920.0, ans=0.125 2023-11-26 00:49:14,615 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471300 2023-11-26 00:49:21,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3141986.6666666665, ans=0.0 2023-11-26 00:49:22,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2023-11-26 00:49:23,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.557e+01 9.249e+01 9.915e+01 1.418e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 00:49:27,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3142053.3333333335, ans=0.1 2023-11-26 00:49:30,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3142053.3333333335, ans=0.2 2023-11-26 00:49:33,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3142053.3333333335, ans=0.2 2023-11-26 00:49:44,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3142120.0, ans=0.125 2023-11-26 00:49:48,003 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2400, loss[loss=0.05713, simple_loss=0.06887, pruned_loss=0.0103, audio_tagging_loss=0.01239, over 15219.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09003, pruned_loss=0.01255, audio_tagging_loss=0.00897, over 3035309.99 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:49:51,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=12.0 2023-11-26 00:50:09,801 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471350 2023-11-26 00:50:10,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-26 00:50:41,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142453.3333333335, ans=0.125 2023-11-26 00:50:43,038 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2450, loss[loss=0.05797, simple_loss=0.08363, pruned_loss=0.008739, audio_tagging_loss=0.007415, over 14796.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08978, pruned_loss=0.01243, audio_tagging_loss=0.009015, over 3036280.86 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:50:46,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-26 00:50:53,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3142586.6666666665, ans=0.09899494936611666 2023-11-26 00:51:00,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2023-11-26 00:51:04,639 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471400 2023-11-26 00:51:13,857 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.694e+01 9.441e+01 1.025e+02 1.251e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 00:51:17,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3142720.0, ans=0.0 2023-11-26 00:51:18,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-11-26 00:51:29,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3142786.6666666665, ans=0.125 2023-11-26 00:51:34,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3142786.6666666665, ans=0.0 2023-11-26 00:51:36,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3142853.3333333335, ans=0.125 2023-11-26 00:51:37,662 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2500, loss[loss=0.07793, simple_loss=0.1095, pruned_loss=0.01476, audio_tagging_loss=0.008429, over 14924.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09019, pruned_loss=0.01245, audio_tagging_loss=0.009044, over 3037431.69 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:51:59,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3142986.6666666665, ans=0.125 2023-11-26 00:52:00,124 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471450 2023-11-26 00:52:04,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3142986.6666666665, ans=0.125 2023-11-26 00:52:15,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2023-11-26 00:52:23,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3143120.0, ans=0.125 2023-11-26 00:52:24,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-26 00:52:33,323 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2550, loss[loss=0.06546, simple_loss=0.08977, pruned_loss=0.01249, audio_tagging_loss=0.008084, over 15782.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09026, pruned_loss=0.01248, audio_tagging_loss=0.00904, over 3039047.84 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:52:35,972 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-11-26 00:52:38,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3143186.6666666665, ans=0.2 2023-11-26 00:52:54,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471500 2023-11-26 00:53:03,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.571e+01 9.048e+01 1.003e+02 1.375e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-26 00:53:05,681 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:53:27,800 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2600, loss[loss=0.04188, simple_loss=0.05495, pruned_loss=0.003809, audio_tagging_loss=0.01059, over 14939.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08989, pruned_loss=0.01247, audio_tagging_loss=0.009037, over 3042104.91 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:53:35,984 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-26 00:53:39,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3143586.6666666665, ans=0.125 2023-11-26 00:53:49,065 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471550 2023-11-26 00:53:53,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-11-26 00:54:02,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3143720.0, ans=0.125 2023-11-26 00:54:13,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-11-26 00:54:16,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3143786.6666666665, ans=0.125 2023-11-26 00:54:17,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3143786.6666666665, ans=0.1 2023-11-26 00:54:22,452 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2650, loss[loss=0.03953, simple_loss=0.04677, pruned_loss=0.007231, audio_tagging_loss=0.008917, over 14513.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08957, pruned_loss=0.01241, audio_tagging_loss=0.009012, over 3035689.11 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:54:26,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3143853.3333333335, ans=0.125 2023-11-26 00:54:44,941 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471600 2023-11-26 00:54:46,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3143986.6666666665, ans=0.125 2023-11-26 00:54:54,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.622e+01 9.253e+01 1.030e+02 1.251e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 00:55:10,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3144120.0, ans=0.1 2023-11-26 00:55:18,668 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2700, loss[loss=0.06443, simple_loss=0.07168, pruned_loss=0.01634, audio_tagging_loss=0.01225, over 13518.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0901, pruned_loss=0.01249, audio_tagging_loss=0.008914, over 3037174.53 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:55:22,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3144186.6666666665, ans=0.2 2023-11-26 00:55:27,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3144186.6666666665, ans=0.125 2023-11-26 00:55:32,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3144253.3333333335, ans=0.125 2023-11-26 00:55:33,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3144253.3333333335, ans=0.125 2023-11-26 00:55:39,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3144253.3333333335, ans=0.1 2023-11-26 00:55:41,223 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471650 2023-11-26 00:56:15,107 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2750, loss[loss=0.07316, simple_loss=0.09645, pruned_loss=0.01621, audio_tagging_loss=0.008725, over 14462.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08986, pruned_loss=0.01261, audio_tagging_loss=0.008847, over 3034948.36 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:56:36,217 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471700 2023-11-26 00:56:45,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.564e+01 9.385e+01 1.025e+02 1.216e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 00:57:03,943 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:57:10,322 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2800, loss[loss=0.06286, simple_loss=0.08968, pruned_loss=0.01204, audio_tagging_loss=0.005979, over 14926.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08953, pruned_loss=0.01242, audio_tagging_loss=0.008873, over 3033884.62 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:57:20,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3144920.0, ans=0.0 2023-11-26 00:57:33,146 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471750 2023-11-26 00:57:37,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3144986.6666666665, ans=0.125 2023-11-26 00:57:54,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3145120.0, ans=0.0 2023-11-26 00:57:57,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2023-11-26 00:58:05,892 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2850, loss[loss=0.06361, simple_loss=0.08785, pruned_loss=0.01075, audio_tagging_loss=0.008934, over 14826.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08905, pruned_loss=0.01224, audio_tagging_loss=0.008875, over 3030417.28 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:58:14,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3145186.6666666665, ans=0.125 2023-11-26 00:58:28,866 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471800 2023-11-26 00:58:35,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3145320.0, ans=0.125 2023-11-26 00:58:38,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 8.895e+01 9.329e+01 9.789e+01 1.221e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 00:58:41,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3145386.6666666665, ans=0.0 2023-11-26 00:58:59,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2023-11-26 00:59:02,280 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2900, loss[loss=0.07509, simple_loss=0.1054, pruned_loss=0.01452, audio_tagging_loss=0.007884, over 14741.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09011, pruned_loss=0.0125, audio_tagging_loss=0.008785, over 3031477.53 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:59:05,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3145520.0, ans=0.2 2023-11-26 00:59:12,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2023-11-26 00:59:18,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3145586.6666666665, ans=0.125 2023-11-26 00:59:24,301 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471850 2023-11-26 00:59:28,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-26 00:59:46,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3145786.6666666665, ans=0.1 2023-11-26 00:59:58,036 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2950, loss[loss=0.04815, simple_loss=0.06494, pruned_loss=0.007576, audio_tagging_loss=0.008103, over 14828.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08961, pruned_loss=0.01241, audio_tagging_loss=0.008867, over 3031212.11 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:59:58,327 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:00:01,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3145853.3333333335, ans=0.09899494936611666 2023-11-26 01:00:20,338 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471900 2023-11-26 01:00:31,958 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.856e+01 9.351e+01 9.999e+01 2.175e+02, threshold=1.870e+02, percent-clipped=2.0 2023-11-26 01:00:46,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3146120.0, ans=0.125 2023-11-26 01:00:47,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3146120.0, ans=0.1 2023-11-26 01:00:53,324 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3000, loss[loss=0.06859, simple_loss=0.0863, pruned_loss=0.01335, audio_tagging_loss=0.01209, over 14375.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09091, pruned_loss=0.01273, audio_tagging_loss=0.008957, over 3034195.91 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:00:53,325 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 01:01:20,281 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2482, 3.0624, 3.3131, 3.0393, 3.7515, 3.7817, 3.2719, 3.2433], device='cuda:2') 2023-11-26 01:01:20,293 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9691, 3.1764, 2.9849, 3.0792, 3.4065, 2.8655, 3.4450, 2.7019], device='cuda:2') 2023-11-26 01:01:25,515 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.05777, simple_loss=0.05069, pruned_loss=0.005189, audio_tagging_loss=0.02724, over 4681554.00 frames. 2023-11-26 01:01:25,515 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 01:01:42,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3146253.3333333335, ans=0.0 2023-11-26 01:01:46,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471950 2023-11-26 01:02:20,660 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3050, loss[loss=0.07955, simple_loss=0.1236, pruned_loss=0.01289, audio_tagging_loss=0.004847, over 14006.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09181, pruned_loss=0.01285, audio_tagging_loss=0.008921, over 3028910.33 frames. ], batch size: 52, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:02:20,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3146520.0, ans=0.025 2023-11-26 01:02:33,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3146586.6666666665, ans=0.1 2023-11-26 01:02:38,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3146586.6666666665, ans=0.0 2023-11-26 01:02:42,821 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472000 2023-11-26 01:02:47,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3146653.3333333335, ans=0.1 2023-11-26 01:02:53,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3146653.3333333335, ans=0.0 2023-11-26 01:02:56,921 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.712e+01 9.411e+01 1.021e+02 1.458e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 01:02:56,993 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:03:00,323 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:03:00,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-26 01:03:06,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3146786.6666666665, ans=0.0 2023-11-26 01:03:18,240 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3100, loss[loss=0.06017, simple_loss=0.08632, pruned_loss=0.008382, audio_tagging_loss=0.008633, over 13987.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09089, pruned_loss=0.01275, audio_tagging_loss=0.009054, over 3026376.22 frames. ], batch size: 51, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:03:23,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3146853.3333333335, ans=0.125 2023-11-26 01:03:41,114 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472050 2023-11-26 01:03:56,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3147053.3333333335, ans=0.125 2023-11-26 01:04:12,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3147186.6666666665, ans=0.125 2023-11-26 01:04:14,260 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3150, loss[loss=0.05095, simple_loss=0.06516, pruned_loss=0.007023, audio_tagging_loss=0.01134, over 15004.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09107, pruned_loss=0.01274, audio_tagging_loss=0.009026, over 3034595.02 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:04:16,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3147186.6666666665, ans=0.0 2023-11-26 01:04:17,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3147186.6666666665, ans=0.2 2023-11-26 01:04:21,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3147186.6666666665, ans=0.125 2023-11-26 01:04:22,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3147186.6666666665, ans=0.0 2023-11-26 01:04:36,257 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472100 2023-11-26 01:04:47,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.868e+01 9.358e+01 9.908e+01 1.230e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 01:05:00,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-11-26 01:05:09,124 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:05:09,981 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3200, loss[loss=0.04806, simple_loss=0.0562, pruned_loss=0.01086, audio_tagging_loss=0.009097, over 16162.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09171, pruned_loss=0.01299, audio_tagging_loss=0.009017, over 3034478.39 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:05:11,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3147520.0, ans=0.025 2023-11-26 01:05:16,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3147520.0, ans=0.0 2023-11-26 01:05:32,068 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472150 2023-11-26 01:06:04,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3147853.3333333335, ans=0.125 2023-11-26 01:06:04,938 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3250, loss[loss=0.06873, simple_loss=0.09086, pruned_loss=0.01272, audio_tagging_loss=0.01058, over 14092.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09103, pruned_loss=0.01286, audio_tagging_loss=0.009167, over 3038788.13 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:06:06,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3147853.3333333335, ans=0.125 2023-11-26 01:06:09,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3147853.3333333335, ans=0.125 2023-11-26 01:06:11,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-26 01:06:12,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3147853.3333333335, ans=0.0 2023-11-26 01:06:27,275 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472200 2023-11-26 01:06:27,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3147986.6666666665, ans=0.1 2023-11-26 01:06:38,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.733e+01 9.362e+01 1.015e+02 1.651e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 01:07:01,106 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3300, loss[loss=0.08849, simple_loss=0.1243, pruned_loss=0.01632, audio_tagging_loss=0.009997, over 15688.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09134, pruned_loss=0.01288, audio_tagging_loss=0.009234, over 3048850.03 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:07:07,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3148186.6666666665, ans=0.0 2023-11-26 01:07:23,463 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472250 2023-11-26 01:07:37,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3148386.6666666665, ans=0.125 2023-11-26 01:07:57,001 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3350, loss[loss=0.05433, simple_loss=0.07283, pruned_loss=0.008712, audio_tagging_loss=0.009197, over 16271.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09191, pruned_loss=0.01297, audio_tagging_loss=0.009037, over 3047804.51 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:07:57,249 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:08:04,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3148520.0, ans=0.125 2023-11-26 01:08:07,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3148586.6666666665, ans=0.0 2023-11-26 01:08:19,835 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472300 2023-11-26 01:08:28,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3148653.3333333335, ans=0.95 2023-11-26 01:08:30,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3148720.0, ans=0.1 2023-11-26 01:08:30,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.683e+01 9.249e+01 1.019e+02 1.203e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 01:08:36,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3148720.0, ans=0.125 2023-11-26 01:08:46,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-26 01:08:52,802 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3400, loss[loss=0.08621, simple_loss=0.1168, pruned_loss=0.02071, audio_tagging_loss=0.007101, over 14151.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09236, pruned_loss=0.01319, audio_tagging_loss=0.00905, over 3041091.98 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:09:07,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3148920.0, ans=0.95 2023-11-26 01:09:15,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472350 2023-11-26 01:09:38,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3149120.0, ans=0.125 2023-11-26 01:09:48,872 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3450, loss[loss=0.06322, simple_loss=0.08803, pruned_loss=0.00986, audio_tagging_loss=0.009339, over 15499.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.0919, pruned_loss=0.01318, audio_tagging_loss=0.008919, over 3047222.89 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:10:11,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472400 2023-11-26 01:10:11,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3149320.0, ans=0.2 2023-11-26 01:10:15,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3149320.0, ans=0.0 2023-11-26 01:10:22,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.810e+01 9.451e+01 1.004e+02 1.366e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 01:10:26,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3149386.6666666665, ans=0.2 2023-11-26 01:10:31,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3149386.6666666665, ans=0.1 2023-11-26 01:10:37,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.08 vs. limit=22.5 2023-11-26 01:10:39,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3149453.3333333335, ans=0.0 2023-11-26 01:10:44,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3149520.0, ans=0.125 2023-11-26 01:10:45,052 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3500, loss[loss=0.08345, simple_loss=0.1146, pruned_loss=0.01732, audio_tagging_loss=0.008812, over 15851.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09238, pruned_loss=0.01323, audio_tagging_loss=0.008789, over 3047954.29 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:11:01,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3149586.6666666665, ans=0.07 2023-11-26 01:11:08,034 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472450 2023-11-26 01:11:15,500 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:11:23,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3149720.0, ans=0.125 2023-11-26 01:11:32,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3149786.6666666665, ans=0.5 2023-11-26 01:11:40,872 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3550, loss[loss=0.04515, simple_loss=0.05852, pruned_loss=0.007564, audio_tagging_loss=0.008327, over 15535.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09096, pruned_loss=0.01297, audio_tagging_loss=0.008813, over 3045933.35 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:11:59,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3149920.0, ans=0.125 2023-11-26 01:12:04,002 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472500 2023-11-26 01:12:04,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3149986.6666666665, ans=0.0 2023-11-26 01:12:04,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3149986.6666666665, ans=0.04949747468305833 2023-11-26 01:12:14,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.583e+01 9.059e+01 9.596e+01 1.364e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-26 01:12:30,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3150120.0, ans=0.1 2023-11-26 01:12:35,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3150120.0, ans=0.0 2023-11-26 01:12:37,549 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3600, loss[loss=0.08114, simple_loss=0.1067, pruned_loss=0.01784, audio_tagging_loss=0.009974, over 15608.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09009, pruned_loss=0.01279, audio_tagging_loss=0.008862, over 3045241.08 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:12:49,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3150253.3333333335, ans=0.125 2023-11-26 01:12:52,803 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-26 01:12:59,415 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472550 2023-11-26 01:12:59,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3150320.0, ans=0.125 2023-11-26 01:13:16,239 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:13:33,440 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3650, loss[loss=0.0801, simple_loss=0.1154, pruned_loss=0.01507, audio_tagging_loss=0.00732, over 15562.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09084, pruned_loss=0.01297, audio_tagging_loss=0.008853, over 3047178.97 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:13:33,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=22.5 2023-11-26 01:13:40,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3150520.0, ans=0.1 2023-11-26 01:13:44,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3150586.6666666665, ans=0.07 2023-11-26 01:13:49,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3150586.6666666665, ans=0.0 2023-11-26 01:13:55,253 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472600 2023-11-26 01:13:55,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3150653.3333333335, ans=0.125 2023-11-26 01:14:08,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.763e+01 9.129e+01 9.774e+01 1.635e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-26 01:14:24,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3150786.6666666665, ans=0.09899494936611666 2023-11-26 01:14:27,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=22.5 2023-11-26 01:14:28,725 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3700, loss[loss=0.07655, simple_loss=0.1029, pruned_loss=0.01624, audio_tagging_loss=0.008887, over 15990.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09067, pruned_loss=0.01287, audio_tagging_loss=0.008815, over 3049257.30 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:14:32,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3150853.3333333335, ans=0.1 2023-11-26 01:14:45,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3150920.0, ans=0.1 2023-11-26 01:14:52,250 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472650 2023-11-26 01:14:59,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3150986.6666666665, ans=0.125 2023-11-26 01:15:09,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=12.0 2023-11-26 01:15:10,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3151053.3333333335, ans=0.5 2023-11-26 01:15:20,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3151120.0, ans=10.0 2023-11-26 01:15:25,865 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3750, loss[loss=0.07322, simple_loss=0.1064, pruned_loss=0.01368, audio_tagging_loss=0.006337, over 15175.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09034, pruned_loss=0.01275, audio_tagging_loss=0.008804, over 3051077.78 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:15:47,756 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472700 2023-11-26 01:15:53,151 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:15:54,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3151320.0, ans=0.95 2023-11-26 01:15:57,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-26 01:15:59,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.900e+01 9.433e+01 1.035e+02 1.729e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 01:16:06,227 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:16:07,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3151386.6666666665, ans=0.0 2023-11-26 01:16:21,664 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3800, loss[loss=0.07613, simple_loss=0.1082, pruned_loss=0.01352, audio_tagging_loss=0.008531, over 14222.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09141, pruned_loss=0.01286, audio_tagging_loss=0.008814, over 3050448.43 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:16:28,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3151520.0, ans=0.0 2023-11-26 01:16:41,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3151586.6666666665, ans=0.04949747468305833 2023-11-26 01:16:43,099 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472750 2023-11-26 01:16:45,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3151653.3333333335, ans=0.2 2023-11-26 01:16:49,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3151653.3333333335, ans=0.0 2023-11-26 01:16:54,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3151720.0, ans=0.125 2023-11-26 01:17:02,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-26 01:17:16,307 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3850, loss[loss=0.06571, simple_loss=0.08019, pruned_loss=0.01546, audio_tagging_loss=0.01016, over 14521.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.0921, pruned_loss=0.01297, audio_tagging_loss=0.00879, over 3047089.52 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:17:21,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=12.0 2023-11-26 01:17:39,146 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472800 2023-11-26 01:17:51,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.590e+01 9.252e+01 9.700e+01 1.619e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 01:18:01,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3152120.0, ans=0.1 2023-11-26 01:18:02,151 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:18:13,114 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3900, loss[loss=0.0764, simple_loss=0.102, pruned_loss=0.01794, audio_tagging_loss=0.007446, over 15929.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09188, pruned_loss=0.01294, audio_tagging_loss=0.008871, over 3046901.09 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:18:15,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-26 01:18:16,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-26 01:18:17,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-26 01:18:24,204 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-26 01:18:34,790 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472850 2023-11-26 01:18:52,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3152386.6666666665, ans=0.125 2023-11-26 01:19:04,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3152453.3333333335, ans=0.125 2023-11-26 01:19:04,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3152453.3333333335, ans=0.125 2023-11-26 01:19:08,180 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3950, loss[loss=0.06026, simple_loss=0.08155, pruned_loss=0.01176, audio_tagging_loss=0.007725, over 14404.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09118, pruned_loss=0.01282, audio_tagging_loss=0.008995, over 3052967.50 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:19:08,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3152520.0, ans=0.125 2023-11-26 01:19:21,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3152586.6666666665, ans=0.125 2023-11-26 01:19:29,415 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472900 2023-11-26 01:19:36,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3152653.3333333335, ans=0.125 2023-11-26 01:19:40,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3152720.0, ans=0.125 2023-11-26 01:19:42,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.671e+01 9.267e+01 9.996e+01 1.170e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 01:19:47,311 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2023-11-26 01:20:00,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3152786.6666666665, ans=0.025 2023-11-26 01:20:03,229 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4000, loss[loss=0.07566, simple_loss=0.0984, pruned_loss=0.01609, audio_tagging_loss=0.01037, over 15666.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09131, pruned_loss=0.01292, audio_tagging_loss=0.008998, over 3053624.73 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:20:13,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3152920.0, ans=0.0 2023-11-26 01:20:21,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3152920.0, ans=0.015 2023-11-26 01:20:25,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3152986.6666666665, ans=0.1 2023-11-26 01:20:26,155 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472950 2023-11-26 01:20:26,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-26 01:20:37,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=12.0 2023-11-26 01:20:38,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3153053.3333333335, ans=0.0 2023-11-26 01:20:40,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3153053.3333333335, ans=0.2 2023-11-26 01:20:53,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=15.0 2023-11-26 01:20:54,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153120.0, ans=0.1 2023-11-26 01:20:58,615 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4050, loss[loss=0.06244, simple_loss=0.07022, pruned_loss=0.01342, audio_tagging_loss=0.0139, over 16017.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09134, pruned_loss=0.0128, audio_tagging_loss=0.00904, over 3045985.45 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:21:00,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3153186.6666666665, ans=0.125 2023-11-26 01:21:00,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3153186.6666666665, ans=0.2 2023-11-26 01:21:03,963 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:21:19,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3153253.3333333335, ans=0.125 2023-11-26 01:21:21,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3153320.0, ans=15.0 2023-11-26 01:21:22,156 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473000 2023-11-26 01:21:35,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.884e+01 9.464e+01 1.024e+02 1.208e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 01:21:35,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3153386.6666666665, ans=0.0 2023-11-26 01:21:40,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153386.6666666665, ans=0.1 2023-11-26 01:21:47,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3153453.3333333335, ans=0.0 2023-11-26 01:21:55,902 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4100, loss[loss=0.066, simple_loss=0.09051, pruned_loss=0.01179, audio_tagging_loss=0.008962, over 15448.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09132, pruned_loss=0.01278, audio_tagging_loss=0.009096, over 3041334.31 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:22:07,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3153586.6666666665, ans=0.125 2023-11-26 01:22:09,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2023-11-26 01:22:17,700 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473050 2023-11-26 01:22:18,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3153653.3333333335, ans=0.125 2023-11-26 01:22:43,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3153786.6666666665, ans=0.1 2023-11-26 01:22:51,522 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4150, loss[loss=0.07284, simple_loss=0.1078, pruned_loss=0.01086, audio_tagging_loss=0.008081, over 15095.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09171, pruned_loss=0.01287, audio_tagging_loss=0.008874, over 3041011.93 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:22:52,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3153853.3333333335, ans=0.125 2023-11-26 01:23:01,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.99 vs. limit=10.0 2023-11-26 01:23:13,778 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473100 2023-11-26 01:23:27,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.761e+01 9.353e+01 9.782e+01 1.109e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 01:23:32,739 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:23:46,540 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4200, loss[loss=0.05022, simple_loss=0.063, pruned_loss=0.008171, audio_tagging_loss=0.01055, over 13860.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09078, pruned_loss=0.01274, audio_tagging_loss=0.008859, over 3043504.14 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:23:53,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3154186.6666666665, ans=0.07 2023-11-26 01:23:59,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3154253.3333333335, ans=0.1 2023-11-26 01:24:10,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473150 2023-11-26 01:24:10,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3154320.0, ans=0.125 2023-11-26 01:24:21,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3154386.6666666665, ans=0.07 2023-11-26 01:24:42,952 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4250, loss[loss=0.0591, simple_loss=0.08527, pruned_loss=0.007022, audio_tagging_loss=0.009447, over 15221.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09174, pruned_loss=0.01292, audio_tagging_loss=0.008792, over 3048665.04 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:24:45,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3154520.0, ans=0.125 2023-11-26 01:24:59,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3154586.6666666665, ans=0.125 2023-11-26 01:25:00,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3154586.6666666665, ans=0.1 2023-11-26 01:25:05,331 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473200 2023-11-26 01:25:08,355 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-26 01:25:19,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.719e+01 9.230e+01 1.020e+02 1.385e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-26 01:25:25,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3154720.0, ans=0.125 2023-11-26 01:25:39,086 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4300, loss[loss=0.07129, simple_loss=0.1049, pruned_loss=0.01262, audio_tagging_loss=0.006197, over 14958.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09149, pruned_loss=0.01288, audio_tagging_loss=0.00876, over 3048250.99 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:26:00,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3154986.6666666665, ans=0.2 2023-11-26 01:26:01,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473250 2023-11-26 01:26:05,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2023-11-26 01:26:16,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3155053.3333333335, ans=0.0 2023-11-26 01:26:18,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-11-26 01:26:34,030 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4350, loss[loss=0.06024, simple_loss=0.08323, pruned_loss=0.008598, audio_tagging_loss=0.01003, over 15039.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09129, pruned_loss=0.01284, audio_tagging_loss=0.008703, over 3050312.37 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:26:36,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3155186.6666666665, ans=0.05 2023-11-26 01:26:53,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3155253.3333333335, ans=0.125 2023-11-26 01:26:55,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3155320.0, ans=0.1 2023-11-26 01:26:56,912 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473300 2023-11-26 01:27:08,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2023-11-26 01:27:09,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.594e+01 9.290e+01 1.001e+02 1.319e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 01:27:30,056 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4400, loss[loss=0.05864, simple_loss=0.08972, pruned_loss=0.007802, audio_tagging_loss=0.005975, over 16194.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09169, pruned_loss=0.01278, audio_tagging_loss=0.008637, over 3058603.97 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:27:52,525 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473350 2023-11-26 01:27:54,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3155653.3333333335, ans=0.2 2023-11-26 01:28:08,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=12.0 2023-11-26 01:28:22,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3155786.6666666665, ans=0.1 2023-11-26 01:28:26,503 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4450, loss[loss=0.05636, simple_loss=0.06703, pruned_loss=0.01223, audio_tagging_loss=0.01062, over 15368.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09173, pruned_loss=0.01268, audio_tagging_loss=0.008617, over 3053876.53 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:28:28,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2023-11-26 01:28:32,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3155853.3333333335, ans=0.1 2023-11-26 01:28:41,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3155920.0, ans=0.0 2023-11-26 01:28:41,670 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:28:48,836 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473400 2023-11-26 01:29:02,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.765e+01 9.296e+01 9.987e+01 1.152e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 01:29:02,708 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:29:16,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3156120.0, ans=0.125 2023-11-26 01:29:22,005 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4500, loss[loss=0.07644, simple_loss=0.09894, pruned_loss=0.01644, audio_tagging_loss=0.01053, over 15942.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.0908, pruned_loss=0.01254, audio_tagging_loss=0.008736, over 3058929.30 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:29:44,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473450 2023-11-26 01:29:47,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3156320.0, ans=0.125 2023-11-26 01:29:57,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3156386.6666666665, ans=0.05 2023-11-26 01:30:00,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3156386.6666666665, ans=0.2 2023-11-26 01:30:07,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3156453.3333333335, ans=0.125 2023-11-26 01:30:15,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3156453.3333333335, ans=0.1 2023-11-26 01:30:18,336 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4550, loss[loss=0.07001, simple_loss=0.0962, pruned_loss=0.01595, audio_tagging_loss=0.00596, over 17141.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09086, pruned_loss=0.0126, audio_tagging_loss=0.008762, over 3060050.93 frames. ], batch size: 64, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:30:31,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3156586.6666666665, ans=0.035 2023-11-26 01:30:36,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3156586.6666666665, ans=0.2 2023-11-26 01:30:40,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473500 2023-11-26 01:30:40,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3156653.3333333335, ans=0.1 2023-11-26 01:30:47,567 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2023-11-26 01:30:56,712 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.762e+01 9.232e+01 1.004e+02 1.439e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-26 01:31:02,003 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:31:08,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3156786.6666666665, ans=0.1 2023-11-26 01:31:14,177 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4600, loss[loss=0.06501, simple_loss=0.09572, pruned_loss=0.01004, audio_tagging_loss=0.0071, over 15809.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09068, pruned_loss=0.01264, audio_tagging_loss=0.008791, over 3051350.04 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:31:26,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3156920.0, ans=0.0 2023-11-26 01:31:27,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3156920.0, ans=0.125 2023-11-26 01:31:29,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3156920.0, ans=0.125 2023-11-26 01:31:30,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3156920.0, ans=0.125 2023-11-26 01:31:31,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-11-26 01:31:32,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3156920.0, ans=0.125 2023-11-26 01:31:35,981 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473550 2023-11-26 01:31:53,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-26 01:32:10,070 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4650, loss[loss=0.05852, simple_loss=0.07502, pruned_loss=0.009897, audio_tagging_loss=0.01112, over 15044.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09029, pruned_loss=0.0124, audio_tagging_loss=0.008881, over 3058598.24 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:32:14,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3157186.6666666665, ans=0.1 2023-11-26 01:32:15,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3157186.6666666665, ans=0.125 2023-11-26 01:32:19,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2023-11-26 01:32:32,769 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473600 2023-11-26 01:32:36,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-26 01:32:38,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3157320.0, ans=0.125 2023-11-26 01:32:45,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.43 vs. limit=10.0 2023-11-26 01:32:47,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 8.767e+01 9.190e+01 1.011e+02 1.594e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-26 01:32:52,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3157386.6666666665, ans=0.125 2023-11-26 01:32:57,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3157453.3333333335, ans=0.125 2023-11-26 01:33:05,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-26 01:33:06,568 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4700, loss[loss=0.07652, simple_loss=0.1131, pruned_loss=0.01294, audio_tagging_loss=0.007027, over 14946.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09013, pruned_loss=0.01258, audio_tagging_loss=0.009016, over 3060058.77 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:33:11,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2023-11-26 01:33:16,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3157520.0, ans=0.0 2023-11-26 01:33:20,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3157586.6666666665, ans=0.07 2023-11-26 01:33:28,385 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473650 2023-11-26 01:33:29,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3157653.3333333335, ans=0.2 2023-11-26 01:33:34,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3157653.3333333335, ans=0.0 2023-11-26 01:33:39,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3157720.0, ans=0.0 2023-11-26 01:34:02,325 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4750, loss[loss=0.07382, simple_loss=0.1028, pruned_loss=0.01246, audio_tagging_loss=0.009948, over 14049.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09063, pruned_loss=0.01264, audio_tagging_loss=0.009134, over 3053173.48 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:34:12,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3157920.0, ans=0.1 2023-11-26 01:34:14,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3157920.0, ans=0.2 2023-11-26 01:34:24,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473700 2023-11-26 01:34:40,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.549e+01 9.197e+01 1.001e+02 1.331e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 01:34:45,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3158053.3333333335, ans=0.125 2023-11-26 01:34:48,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3158120.0, ans=0.125 2023-11-26 01:34:57,732 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4800, loss[loss=0.07719, simple_loss=0.1072, pruned_loss=0.01584, audio_tagging_loss=0.007771, over 14976.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09014, pruned_loss=0.0125, audio_tagging_loss=0.009139, over 3048890.41 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:35:20,595 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473750 2023-11-26 01:35:54,531 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4850, loss[loss=0.04508, simple_loss=0.05729, pruned_loss=0.006223, audio_tagging_loss=0.01021, over 13813.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08983, pruned_loss=0.0125, audio_tagging_loss=0.009224, over 3041456.61 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:35:56,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3158520.0, ans=0.125 2023-11-26 01:36:11,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3158586.6666666665, ans=0.125 2023-11-26 01:36:16,362 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473800 2023-11-26 01:36:16,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3158653.3333333335, ans=0.125 2023-11-26 01:36:26,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3158720.0, ans=0.2 2023-11-26 01:36:32,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.661e+01 9.165e+01 9.886e+01 1.284e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 01:36:36,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3158720.0, ans=0.2 2023-11-26 01:36:40,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3158786.6666666665, ans=0.125 2023-11-26 01:36:46,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3158786.6666666665, ans=0.1 2023-11-26 01:36:48,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3158786.6666666665, ans=0.1 2023-11-26 01:36:50,746 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4900, loss[loss=0.07566, simple_loss=0.1074, pruned_loss=0.01679, audio_tagging_loss=0.005158, over 15418.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09012, pruned_loss=0.01261, audio_tagging_loss=0.009066, over 3041749.86 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:37:10,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3158920.0, ans=0.125 2023-11-26 01:37:12,767 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473850 2023-11-26 01:37:38,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.0 2023-11-26 01:37:39,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3159120.0, ans=0.125 2023-11-26 01:37:43,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2023-11-26 01:37:46,067 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4950, loss[loss=0.05988, simple_loss=0.08323, pruned_loss=0.01099, audio_tagging_loss=0.00728, over 15564.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09063, pruned_loss=0.01265, audio_tagging_loss=0.009001, over 3039452.60 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:37:53,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-26 01:37:56,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=3159253.3333333335, ans=0.1 2023-11-26 01:38:09,191 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473900 2023-11-26 01:38:21,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3159386.6666666665, ans=0.125 2023-11-26 01:38:23,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3159386.6666666665, ans=0.1 2023-11-26 01:38:24,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.576e+01 9.071e+01 1.006e+02 1.445e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-26 01:38:39,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-11-26 01:38:41,907 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5000, loss[loss=0.05613, simple_loss=0.07657, pruned_loss=0.008783, audio_tagging_loss=0.009058, over 15240.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09073, pruned_loss=0.01264, audio_tagging_loss=0.008777, over 3037716.20 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:39:01,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3159586.6666666665, ans=0.5 2023-11-26 01:39:04,653 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473950 2023-11-26 01:39:16,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3159720.0, ans=0.1 2023-11-26 01:39:26,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3159786.6666666665, ans=0.04949747468305833 2023-11-26 01:39:28,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3159786.6666666665, ans=0.125 2023-11-26 01:39:37,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3159853.3333333335, ans=0.04949747468305833 2023-11-26 01:39:38,320 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5050, loss[loss=0.05935, simple_loss=0.07697, pruned_loss=0.01252, audio_tagging_loss=0.008346, over 14178.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09173, pruned_loss=0.01287, audio_tagging_loss=0.008703, over 3042186.77 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:39:59,831 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474000 2023-11-26 01:40:06,204 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.36 vs. limit=22.5 2023-11-26 01:40:07,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3159986.6666666665, ans=0.0 2023-11-26 01:40:08,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3159986.6666666665, ans=0.95 2023-11-26 01:40:10,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3159986.6666666665, ans=0.125 2023-11-26 01:40:16,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 8.830e+01 9.284e+01 9.877e+01 1.399e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 01:40:31,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3160120.0, ans=0.1 2023-11-26 01:40:34,111 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5100, loss[loss=0.05678, simple_loss=0.06959, pruned_loss=0.009983, audio_tagging_loss=0.012, over 14802.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09105, pruned_loss=0.01284, audio_tagging_loss=0.008746, over 3048906.80 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:40:38,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3160186.6666666665, ans=0.2 2023-11-26 01:40:56,377 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474050 2023-11-26 01:40:58,912 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2023-11-26 01:41:08,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3160386.6666666665, ans=0.125 2023-11-26 01:41:28,750 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5150, loss[loss=0.06903, simple_loss=0.09191, pruned_loss=0.01603, audio_tagging_loss=0.007047, over 15945.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09011, pruned_loss=0.01275, audio_tagging_loss=0.008726, over 3043412.61 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:41:30,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3160520.0, ans=0.2 2023-11-26 01:41:51,708 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474100 2023-11-26 01:41:58,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3160653.3333333335, ans=0.125 2023-11-26 01:42:07,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.876e+01 9.273e+01 9.906e+01 1.225e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 01:42:25,186 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5200, loss[loss=0.05, simple_loss=0.05924, pruned_loss=0.008524, audio_tagging_loss=0.01186, over 14817.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09038, pruned_loss=0.01267, audio_tagging_loss=0.008707, over 3043198.38 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:42:29,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=22.5 2023-11-26 01:42:31,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3160853.3333333335, ans=0.1 2023-11-26 01:42:39,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3160920.0, ans=0.07 2023-11-26 01:42:46,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3160986.6666666665, ans=0.1 2023-11-26 01:42:46,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474150 2023-11-26 01:42:58,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3161053.3333333335, ans=0.125 2023-11-26 01:43:02,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3161053.3333333335, ans=0.2 2023-11-26 01:43:10,510 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-26 01:43:10,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2023-11-26 01:43:20,741 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5250, loss[loss=0.05334, simple_loss=0.07755, pruned_loss=0.007968, audio_tagging_loss=0.006596, over 15347.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09009, pruned_loss=0.01264, audio_tagging_loss=0.00874, over 3043122.17 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:43:30,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3161253.3333333335, ans=0.125 2023-11-26 01:43:36,869 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:43:43,077 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474200 2023-11-26 01:43:49,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3161320.0, ans=0.125 2023-11-26 01:43:51,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3161320.0, ans=0.0 2023-11-26 01:43:55,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3161386.6666666665, ans=0.1 2023-11-26 01:44:00,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.738e+01 9.374e+01 1.008e+02 2.043e+02, threshold=1.875e+02, percent-clipped=1.0 2023-11-26 01:44:07,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2023-11-26 01:44:16,255 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5300, loss[loss=0.06148, simple_loss=0.0859, pruned_loss=0.01017, audio_tagging_loss=0.008355, over 14042.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09008, pruned_loss=0.01257, audio_tagging_loss=0.008709, over 3044609.49 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:44:30,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.46 vs. limit=22.5 2023-11-26 01:44:31,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3161586.6666666665, ans=0.125 2023-11-26 01:44:32,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3161586.6666666665, ans=0.125 2023-11-26 01:44:38,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3161653.3333333335, ans=0.125 2023-11-26 01:44:39,575 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474250 2023-11-26 01:45:02,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3161786.6666666665, ans=0.125 2023-11-26 01:45:12,043 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5350, loss[loss=0.07488, simple_loss=0.09542, pruned_loss=0.0166, audio_tagging_loss=0.01057, over 15416.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09129, pruned_loss=0.01286, audio_tagging_loss=0.00864, over 3049084.13 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:45:17,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3161853.3333333335, ans=0.125 2023-11-26 01:45:33,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3161986.6666666665, ans=0.125 2023-11-26 01:45:33,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-26 01:45:34,074 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474300 2023-11-26 01:45:50,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3162053.3333333335, ans=0.0 2023-11-26 01:45:50,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.668e+01 9.369e+01 1.006e+02 1.281e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 01:45:51,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3162053.3333333335, ans=0.0 2023-11-26 01:46:08,068 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5400, loss[loss=0.05496, simple_loss=0.06413, pruned_loss=0.01275, audio_tagging_loss=0.01014, over 14159.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09089, pruned_loss=0.01284, audio_tagging_loss=0.008818, over 3049699.86 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:46:11,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3162186.6666666665, ans=0.1 2023-11-26 01:46:19,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3162253.3333333335, ans=0.125 2023-11-26 01:46:26,168 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:46:29,630 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474350 2023-11-26 01:47:02,406 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5450, loss[loss=0.06957, simple_loss=0.09058, pruned_loss=0.01456, audio_tagging_loss=0.00972, over 14560.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09148, pruned_loss=0.01295, audio_tagging_loss=0.008823, over 3059044.22 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:47:23,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3162586.6666666665, ans=0.1 2023-11-26 01:47:25,643 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474400 2023-11-26 01:47:30,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3162653.3333333335, ans=0.0 2023-11-26 01:47:41,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.837e+01 9.304e+01 1.039e+02 1.325e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 01:47:54,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=8.0 2023-11-26 01:47:58,525 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5500, loss[loss=0.05851, simple_loss=0.08121, pruned_loss=0.01039, audio_tagging_loss=0.00752, over 15368.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09174, pruned_loss=0.01296, audio_tagging_loss=0.008861, over 3052704.06 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:48:18,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3162920.0, ans=0.0 2023-11-26 01:48:20,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3162986.6666666665, ans=0.0 2023-11-26 01:48:20,959 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474450 2023-11-26 01:48:24,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3162986.6666666665, ans=0.125 2023-11-26 01:48:40,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3163053.3333333335, ans=0.125 2023-11-26 01:48:44,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3163120.0, ans=0.125 2023-11-26 01:48:54,816 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5550, loss[loss=0.07515, simple_loss=0.09309, pruned_loss=0.01823, audio_tagging_loss=0.01038, over 13939.00 frames. ], tot_loss[loss=0.06846, simple_loss=0.09281, pruned_loss=0.01312, audio_tagging_loss=0.00893, over 3045477.25 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:49:12,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3163253.3333333335, ans=0.09899494936611666 2023-11-26 01:49:13,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3163253.3333333335, ans=0.1 2023-11-26 01:49:16,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474500 2023-11-26 01:49:34,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.884e+01 9.348e+01 1.017e+02 1.220e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 01:49:34,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163386.6666666665, ans=0.1 2023-11-26 01:49:43,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.79 vs. limit=10.0 2023-11-26 01:49:49,977 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5600, loss[loss=0.07174, simple_loss=0.09842, pruned_loss=0.01341, audio_tagging_loss=0.009121, over 14957.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09096, pruned_loss=0.01295, audio_tagging_loss=0.009165, over 3045558.17 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:50:01,644 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2023-11-26 01:50:08,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3163586.6666666665, ans=0.0 2023-11-26 01:50:12,226 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474550 2023-11-26 01:50:18,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3163653.3333333335, ans=0.125 2023-11-26 01:50:30,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3163720.0, ans=0.125 2023-11-26 01:50:31,296 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:50:31,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-11-26 01:50:46,149 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5650, loss[loss=0.06944, simple_loss=0.08943, pruned_loss=0.01374, audio_tagging_loss=0.01099, over 15218.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09054, pruned_loss=0.01272, audio_tagging_loss=0.009248, over 3046518.65 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:50:53,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3163853.3333333335, ans=0.2 2023-11-26 01:50:53,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3163853.3333333335, ans=0.0 2023-11-26 01:51:01,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3163920.0, ans=0.0 2023-11-26 01:51:09,083 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474600 2023-11-26 01:51:14,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3163986.6666666665, ans=0.0 2023-11-26 01:51:26,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.458e+01 9.048e+01 9.939e+01 1.364e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-26 01:51:31,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3164120.0, ans=0.1 2023-11-26 01:51:32,574 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.44 vs. limit=5.0 2023-11-26 01:51:42,945 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5700, loss[loss=0.0705, simple_loss=0.09042, pruned_loss=0.01613, audio_tagging_loss=0.009159, over 16442.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08971, pruned_loss=0.01254, audio_tagging_loss=0.009253, over 3054815.93 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:52:04,719 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474650 2023-11-26 01:52:11,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3164320.0, ans=0.0 2023-11-26 01:52:33,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3164453.3333333335, ans=0.125 2023-11-26 01:52:36,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3164453.3333333335, ans=0.05 2023-11-26 01:52:38,750 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5750, loss[loss=0.07208, simple_loss=0.1003, pruned_loss=0.01332, audio_tagging_loss=0.008619, over 15675.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08922, pruned_loss=0.01241, audio_tagging_loss=0.009098, over 3054047.95 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:52:49,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-26 01:53:00,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3164653.3333333335, ans=0.125 2023-11-26 01:53:01,292 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474700 2023-11-26 01:53:03,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3164653.3333333335, ans=0.09899494936611666 2023-11-26 01:53:20,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.765e+01 9.381e+01 1.013e+02 1.424e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 01:53:21,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3164720.0, ans=0.0 2023-11-26 01:53:34,543 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5800, loss[loss=0.0627, simple_loss=0.08412, pruned_loss=0.01293, audio_tagging_loss=0.007709, over 16057.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.0889, pruned_loss=0.01239, audio_tagging_loss=0.009079, over 3048271.76 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:53:54,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3164920.0, ans=0.125 2023-11-26 01:53:56,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3164986.6666666665, ans=0.125 2023-11-26 01:53:57,405 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474750 2023-11-26 01:54:01,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2023-11-26 01:54:02,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3164986.6666666665, ans=0.125 2023-11-26 01:54:16,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3165053.3333333335, ans=0.125 2023-11-26 01:54:27,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2023-11-26 01:54:30,597 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5850, loss[loss=0.05343, simple_loss=0.06597, pruned_loss=0.007455, audio_tagging_loss=0.01299, over 15557.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08942, pruned_loss=0.01261, audio_tagging_loss=0.009077, over 3048011.27 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:54:37,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3165186.6666666665, ans=0.0 2023-11-26 01:54:49,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3165253.3333333335, ans=0.125 2023-11-26 01:54:52,399 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474800 2023-11-26 01:54:55,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-26 01:54:55,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-26 01:54:55,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3165320.0, ans=0.0 2023-11-26 01:54:59,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-26 01:55:02,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3165320.0, ans=0.2 2023-11-26 01:55:11,940 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.672e+01 9.332e+01 1.005e+02 2.095e+02, threshold=1.866e+02, percent-clipped=1.0 2023-11-26 01:55:17,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3165453.3333333335, ans=0.2 2023-11-26 01:55:22,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-26 01:55:26,150 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5900, loss[loss=0.07743, simple_loss=0.1108, pruned_loss=0.01564, audio_tagging_loss=0.006371, over 15120.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08939, pruned_loss=0.01255, audio_tagging_loss=0.009034, over 3040835.23 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:55:45,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3165586.6666666665, ans=0.5 2023-11-26 01:55:48,408 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474850 2023-11-26 01:55:50,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3165653.3333333335, ans=0.125 2023-11-26 01:55:58,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3165720.0, ans=0.125 2023-11-26 01:56:18,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3165786.6666666665, ans=0.05 2023-11-26 01:56:21,177 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5950, loss[loss=0.05092, simple_loss=0.06735, pruned_loss=0.007866, audio_tagging_loss=0.009381, over 15687.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08847, pruned_loss=0.01231, audio_tagging_loss=0.009006, over 3045865.80 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:56:24,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3165853.3333333335, ans=0.125 2023-11-26 01:56:26,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3165853.3333333335, ans=0.125 2023-11-26 01:56:26,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3165853.3333333335, ans=0.125 2023-11-26 01:56:37,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3165920.0, ans=0.125 2023-11-26 01:56:37,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3165920.0, ans=0.0 2023-11-26 01:56:44,133 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474900 2023-11-26 01:57:02,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.504e+01 9.073e+01 9.680e+01 1.067e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-26 01:57:02,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3166053.3333333335, ans=0.0 2023-11-26 01:57:08,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3166120.0, ans=0.0 2023-11-26 01:57:10,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.01 vs. limit=22.5 2023-11-26 01:57:17,305 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6000, loss[loss=0.04694, simple_loss=0.05732, pruned_loss=0.008353, audio_tagging_loss=0.009929, over 14942.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08784, pruned_loss=0.0122, audio_tagging_loss=0.008999, over 3039037.31 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:57:17,305 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 01:57:41,139 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1555, 3.9794, 3.7423, 3.2798], device='cuda:2') 2023-11-26 01:57:49,492 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.0577, simple_loss=0.05067, pruned_loss=0.005162, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-26 01:57:49,492 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 01:58:09,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3166253.3333333335, ans=0.125 2023-11-26 01:58:12,981 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474950 2023-11-26 01:58:30,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2023-11-26 01:58:30,860 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:58:45,666 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6050, loss[loss=0.06388, simple_loss=0.08418, pruned_loss=0.01228, audio_tagging_loss=0.009505, over 15349.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08759, pruned_loss=0.01226, audio_tagging_loss=0.008914, over 3039942.77 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:58:53,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-26 01:58:53,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3166520.0, ans=0.0 2023-11-26 01:59:08,381 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475000 2023-11-26 01:59:08,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3166653.3333333335, ans=0.125 2023-11-26 01:59:27,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.819e+01 9.341e+01 9.960e+01 1.201e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 01:59:32,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3166786.6666666665, ans=0.04949747468305833 2023-11-26 01:59:42,392 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6100, loss[loss=0.07437, simple_loss=0.1097, pruned_loss=0.01376, audio_tagging_loss=0.005757, over 15758.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.0879, pruned_loss=0.01225, audio_tagging_loss=0.008884, over 3043520.87 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:59:49,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3166853.3333333335, ans=0.0 2023-11-26 02:00:04,230 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475050 2023-11-26 02:00:17,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=15.0 2023-11-26 02:00:19,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3167053.3333333335, ans=0.09899494936611666 2023-11-26 02:00:27,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3167120.0, ans=0.95 2023-11-26 02:00:35,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3167120.0, ans=0.1 2023-11-26 02:00:36,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3167186.6666666665, ans=0.015 2023-11-26 02:00:37,605 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6150, loss[loss=0.05621, simple_loss=0.07574, pruned_loss=0.008763, audio_tagging_loss=0.009579, over 15016.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08718, pruned_loss=0.0122, audio_tagging_loss=0.009091, over 3040636.14 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:00:59,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=15.0 2023-11-26 02:01:00,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475100 2023-11-26 02:01:15,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3167386.6666666665, ans=0.125 2023-11-26 02:01:16,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3167386.6666666665, ans=0.95 2023-11-26 02:01:18,780 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.792e+01 9.265e+01 9.786e+01 1.351e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 02:01:19,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3167386.6666666665, ans=0.0 2023-11-26 02:01:20,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3167386.6666666665, ans=0.1 2023-11-26 02:01:24,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3167453.3333333335, ans=0.1 2023-11-26 02:01:33,682 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6200, loss[loss=0.061, simple_loss=0.08635, pruned_loss=0.009881, audio_tagging_loss=0.007949, over 16036.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08767, pruned_loss=0.01216, audio_tagging_loss=0.009063, over 3041847.33 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:01:41,885 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:01:48,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.31 vs. limit=22.5 2023-11-26 02:01:56,166 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475150 2023-11-26 02:02:16,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3167720.0, ans=0.0 2023-11-26 02:02:30,228 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6250, loss[loss=0.05377, simple_loss=0.06293, pruned_loss=0.009664, audio_tagging_loss=0.01264, over 14871.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08781, pruned_loss=0.01213, audio_tagging_loss=0.009089, over 3037844.95 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:02:51,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475200 2023-11-26 02:02:52,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-26 02:03:11,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.743e+01 9.437e+01 1.009e+02 1.277e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 02:03:25,474 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6300, loss[loss=0.05617, simple_loss=0.07345, pruned_loss=0.0106, audio_tagging_loss=0.008843, over 15505.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08826, pruned_loss=0.01209, audio_tagging_loss=0.009143, over 3044916.18 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:03:48,505 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475250 2023-11-26 02:03:56,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3168320.0, ans=0.125 2023-11-26 02:04:04,081 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:04:12,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-26 02:04:13,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3168453.3333333335, ans=0.125 2023-11-26 02:04:20,886 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6350, loss[loss=0.06785, simple_loss=0.0938, pruned_loss=0.01267, audio_tagging_loss=0.008285, over 15100.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08827, pruned_loss=0.01215, audio_tagging_loss=0.00919, over 3040552.69 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:04:26,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3168520.0, ans=0.0 2023-11-26 02:04:32,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3168586.6666666665, ans=0.125 2023-11-26 02:04:44,218 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475300 2023-11-26 02:05:00,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3168720.0, ans=0.125 2023-11-26 02:05:02,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.674e+01 9.185e+01 1.006e+02 1.507e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 02:05:12,211 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=22.5 2023-11-26 02:05:17,597 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6400, loss[loss=0.07177, simple_loss=0.09098, pruned_loss=0.01664, audio_tagging_loss=0.009641, over 14886.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08935, pruned_loss=0.0123, audio_tagging_loss=0.009134, over 3044887.63 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:05:38,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3168986.6666666665, ans=0.2 2023-11-26 02:05:38,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475350 2023-11-26 02:06:12,562 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6450, loss[loss=0.05267, simple_loss=0.06554, pruned_loss=0.009069, audio_tagging_loss=0.01083, over 15447.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08958, pruned_loss=0.01235, audio_tagging_loss=0.009204, over 3037563.28 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:06:34,476 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475400 2023-11-26 02:06:55,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.666e+01 9.241e+01 9.984e+01 1.381e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 02:06:57,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3169453.3333333335, ans=0.125 2023-11-26 02:07:04,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3169453.3333333335, ans=0.125 2023-11-26 02:07:08,092 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6500, loss[loss=0.05936, simple_loss=0.07928, pruned_loss=0.0135, audio_tagging_loss=0.006226, over 16479.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09014, pruned_loss=0.01259, audio_tagging_loss=0.009151, over 3034185.18 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:07:09,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3169520.0, ans=0.125 2023-11-26 02:07:31,667 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475450 2023-11-26 02:07:32,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3169653.3333333335, ans=0.1 2023-11-26 02:07:38,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3169653.3333333335, ans=0.0 2023-11-26 02:07:55,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3169786.6666666665, ans=0.2 2023-11-26 02:08:04,752 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6550, loss[loss=0.07741, simple_loss=0.1066, pruned_loss=0.01591, audio_tagging_loss=0.008194, over 16051.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08993, pruned_loss=0.01253, audio_tagging_loss=0.008942, over 3035094.89 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:08:06,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3169853.3333333335, ans=0.05 2023-11-26 02:08:09,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3169853.3333333335, ans=0.125 2023-11-26 02:08:14,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-26 02:08:15,103 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.92 vs. limit=22.5 2023-11-26 02:08:27,232 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475500 2023-11-26 02:08:32,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3169986.6666666665, ans=0.125 2023-11-26 02:08:47,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.547e+01 9.134e+01 1.014e+02 1.239e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 02:08:58,129 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2023-11-26 02:09:00,757 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6600, loss[loss=0.0775, simple_loss=0.108, pruned_loss=0.01323, audio_tagging_loss=0.01026, over 15292.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09018, pruned_loss=0.01247, audio_tagging_loss=0.008928, over 3037702.31 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:09:14,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3170253.3333333335, ans=0.0 2023-11-26 02:09:22,490 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475550 2023-11-26 02:09:24,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3170320.0, ans=0.1 2023-11-26 02:09:30,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-11-26 02:09:44,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3170453.3333333335, ans=0.125 2023-11-26 02:09:55,794 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6650, loss[loss=0.07587, simple_loss=0.1103, pruned_loss=0.01409, audio_tagging_loss=0.006602, over 15417.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08999, pruned_loss=0.01241, audio_tagging_loss=0.008912, over 3034056.43 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:09:57,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3170520.0, ans=0.0 2023-11-26 02:10:19,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475600 2023-11-26 02:10:33,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3170720.0, ans=0.5 2023-11-26 02:10:38,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.810e+01 9.306e+01 1.020e+02 1.538e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 02:10:52,577 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6700, loss[loss=0.05333, simple_loss=0.07488, pruned_loss=0.006044, audio_tagging_loss=0.00985, over 14980.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09089, pruned_loss=0.01262, audio_tagging_loss=0.008788, over 3036834.31 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:10:53,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-26 02:11:14,721 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475650 2023-11-26 02:11:18,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3170986.6666666665, ans=0.125 2023-11-26 02:11:20,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3170986.6666666665, ans=0.0 2023-11-26 02:11:24,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3171053.3333333335, ans=0.0 2023-11-26 02:11:48,605 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6750, loss[loss=0.07471, simple_loss=0.1031, pruned_loss=0.01516, audio_tagging_loss=0.008014, over 15269.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09037, pruned_loss=0.01254, audio_tagging_loss=0.008838, over 3034783.25 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:11:48,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3171186.6666666665, ans=0.95 2023-11-26 02:12:03,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3171253.3333333335, ans=0.125 2023-11-26 02:12:10,179 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475700 2023-11-26 02:12:24,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3171386.6666666665, ans=0.125 2023-11-26 02:12:30,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.571e+01 9.016e+01 1.004e+02 1.567e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-26 02:12:35,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3171453.3333333335, ans=0.0 2023-11-26 02:12:43,783 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6800, loss[loss=0.08019, simple_loss=0.1141, pruned_loss=0.01624, audio_tagging_loss=0.006912, over 15455.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08985, pruned_loss=0.01255, audio_tagging_loss=0.008937, over 3032720.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:13:06,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475750 2023-11-26 02:13:19,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-11-26 02:13:19,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3171720.0, ans=0.0 2023-11-26 02:13:37,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=15.0 2023-11-26 02:13:38,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3171853.3333333335, ans=0.0 2023-11-26 02:13:39,516 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6850, loss[loss=0.07308, simple_loss=0.09671, pruned_loss=0.01788, audio_tagging_loss=0.006839, over 16161.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.08999, pruned_loss=0.01277, audio_tagging_loss=0.008836, over 3035485.98 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:14:02,214 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475800 2023-11-26 02:14:03,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3171986.6666666665, ans=0.1 2023-11-26 02:14:18,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3172053.3333333335, ans=0.125 2023-11-26 02:14:22,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.606e+01 9.440e+01 1.002e+02 1.257e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 02:14:27,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3172120.0, ans=0.0 2023-11-26 02:14:35,825 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6900, loss[loss=0.05926, simple_loss=0.08375, pruned_loss=0.01055, audio_tagging_loss=0.006834, over 14990.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08974, pruned_loss=0.01266, audio_tagging_loss=0.008757, over 3028408.63 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:14:35,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3172186.6666666665, ans=0.1 2023-11-26 02:14:50,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3172253.3333333335, ans=0.2 2023-11-26 02:14:52,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3172253.3333333335, ans=0.1 2023-11-26 02:14:53,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3172253.3333333335, ans=0.0 2023-11-26 02:14:54,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3172253.3333333335, ans=0.0 2023-11-26 02:14:58,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475850 2023-11-26 02:15:00,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3172320.0, ans=0.125 2023-11-26 02:15:06,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3172320.0, ans=0.125 2023-11-26 02:15:08,258 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:15:10,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3172386.6666666665, ans=0.0 2023-11-26 02:15:17,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3172386.6666666665, ans=0.125 2023-11-26 02:15:19,169 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:15:19,800 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-26 02:15:27,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3172453.3333333335, ans=0.1 2023-11-26 02:15:31,324 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6950, loss[loss=0.06052, simple_loss=0.07928, pruned_loss=0.01133, audio_tagging_loss=0.009552, over 14234.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08925, pruned_loss=0.01254, audio_tagging_loss=0.008712, over 3029731.69 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:15:39,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3172520.0, ans=0.125 2023-11-26 02:15:47,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-26 02:15:54,176 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475900 2023-11-26 02:16:01,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3172653.3333333335, ans=0.125 2023-11-26 02:16:14,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.522e+01 9.109e+01 9.823e+01 1.262e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 02:16:18,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3172786.6666666665, ans=6.0 2023-11-26 02:16:26,956 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7000, loss[loss=0.06264, simple_loss=0.08279, pruned_loss=0.0113, audio_tagging_loss=0.009942, over 14851.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08915, pruned_loss=0.0124, audio_tagging_loss=0.008795, over 3035488.02 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:16:38,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3172920.0, ans=10.0 2023-11-26 02:16:49,061 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475950 2023-11-26 02:16:52,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3172986.6666666665, ans=0.1 2023-11-26 02:16:56,248 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2023-11-26 02:16:56,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3172986.6666666665, ans=0.0 2023-11-26 02:16:57,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3172986.6666666665, ans=0.0 2023-11-26 02:17:01,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3173053.3333333335, ans=10.0 2023-11-26 02:17:21,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3173186.6666666665, ans=0.125 2023-11-26 02:17:22,620 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7050, loss[loss=0.04674, simple_loss=0.07007, pruned_loss=0.005681, audio_tagging_loss=0.006022, over 14461.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08857, pruned_loss=0.01217, audio_tagging_loss=0.008882, over 3034004.51 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:17:27,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3173186.6666666665, ans=0.1 2023-11-26 02:17:44,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476000 2023-11-26 02:18:08,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.411e+01 9.041e+01 9.968e+01 1.223e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-26 02:18:08,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3173453.3333333335, ans=0.0 2023-11-26 02:18:19,704 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7100, loss[loss=0.07484, simple_loss=0.09718, pruned_loss=0.01913, audio_tagging_loss=0.007121, over 14411.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09021, pruned_loss=0.01249, audio_tagging_loss=0.008899, over 3040734.19 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:18:42,523 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476050 2023-11-26 02:18:56,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2023-11-26 02:19:12,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3173786.6666666665, ans=0.1 2023-11-26 02:19:15,667 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7150, loss[loss=0.06128, simple_loss=0.0762, pruned_loss=0.008987, audio_tagging_loss=0.01419, over 15769.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08988, pruned_loss=0.01234, audio_tagging_loss=0.009048, over 3049017.04 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:19:17,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3173853.3333333335, ans=0.125 2023-11-26 02:19:25,902 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:19:37,915 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476100 2023-11-26 02:19:49,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3174053.3333333335, ans=10.0 2023-11-26 02:19:58,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.796e+01 9.304e+01 9.946e+01 1.523e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 02:20:09,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3174120.0, ans=0.0 2023-11-26 02:20:11,673 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7200, loss[loss=0.07544, simple_loss=0.1098, pruned_loss=0.01311, audio_tagging_loss=0.007415, over 15780.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08975, pruned_loss=0.0125, audio_tagging_loss=0.009092, over 3043668.42 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:20:11,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3174186.6666666665, ans=0.0 2023-11-26 02:20:19,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2023-11-26 02:20:30,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3174253.3333333335, ans=0.125 2023-11-26 02:20:33,551 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476150 2023-11-26 02:20:40,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3174320.0, ans=0.0 2023-11-26 02:20:44,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3174386.6666666665, ans=0.0 2023-11-26 02:20:47,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3174386.6666666665, ans=0.2 2023-11-26 02:20:50,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3174386.6666666665, ans=0.1 2023-11-26 02:21:03,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3174453.3333333335, ans=0.0 2023-11-26 02:21:06,663 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7250, loss[loss=0.04772, simple_loss=0.06488, pruned_loss=0.005908, audio_tagging_loss=0.009374, over 14062.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08986, pruned_loss=0.0125, audio_tagging_loss=0.009163, over 3048781.00 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:21:29,711 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476200 2023-11-26 02:21:30,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3174653.3333333335, ans=0.09899494936611666 2023-11-26 02:21:30,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3174653.3333333335, ans=0.125 2023-11-26 02:21:42,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3174720.0, ans=0.0 2023-11-26 02:21:51,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.421e+01 9.196e+01 9.750e+01 1.203e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 02:21:54,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3174786.6666666665, ans=0.0 2023-11-26 02:22:02,900 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7300, loss[loss=0.06319, simple_loss=0.08275, pruned_loss=0.01439, audio_tagging_loss=0.007428, over 16104.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09036, pruned_loss=0.01259, audio_tagging_loss=0.009045, over 3042349.96 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:22:05,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3174853.3333333335, ans=0.125 2023-11-26 02:22:20,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2023-11-26 02:22:21,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3174920.0, ans=0.0 2023-11-26 02:22:25,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476250 2023-11-26 02:22:35,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3175053.3333333335, ans=0.1 2023-11-26 02:22:55,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3175120.0, ans=0.2 2023-11-26 02:22:58,973 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7350, loss[loss=0.07277, simple_loss=0.1043, pruned_loss=0.01354, audio_tagging_loss=0.007098, over 14779.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09018, pruned_loss=0.01257, audio_tagging_loss=0.008927, over 3042873.47 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:23:02,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2023-11-26 02:23:07,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3175186.6666666665, ans=0.125 2023-11-26 02:23:07,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-11-26 02:23:20,504 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476300 2023-11-26 02:23:39,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3175386.6666666665, ans=0.125 2023-11-26 02:23:40,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3175386.6666666665, ans=0.125 2023-11-26 02:23:41,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3175386.6666666665, ans=0.125 2023-11-26 02:23:44,883 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.628e+01 9.246e+01 9.778e+01 1.248e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 02:23:54,428 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7400, loss[loss=0.06032, simple_loss=0.08732, pruned_loss=0.009439, audio_tagging_loss=0.007225, over 13957.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09043, pruned_loss=0.01254, audio_tagging_loss=0.008897, over 3046683.95 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:24:16,196 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476350 2023-11-26 02:24:30,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3175720.0, ans=0.125 2023-11-26 02:24:34,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3175720.0, ans=0.09899494936611666 2023-11-26 02:24:35,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3175720.0, ans=0.2 2023-11-26 02:24:44,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3175786.6666666665, ans=0.0 2023-11-26 02:24:44,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3175786.6666666665, ans=0.0 2023-11-26 02:24:49,189 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7450, loss[loss=0.08198, simple_loss=0.1231, pruned_loss=0.01323, audio_tagging_loss=0.007169, over 16123.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09048, pruned_loss=0.01256, audio_tagging_loss=0.008829, over 3048300.66 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:24:51,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2023-11-26 02:24:53,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3175853.3333333335, ans=15.0 2023-11-26 02:24:57,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3175853.3333333335, ans=0.0 2023-11-26 02:24:58,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2023-11-26 02:25:12,739 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476400 2023-11-26 02:25:16,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3175986.6666666665, ans=0.2 2023-11-26 02:25:26,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=12.0 2023-11-26 02:25:35,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.103e+01 8.524e+01 9.234e+01 9.933e+01 1.379e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 02:25:46,036 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7500, loss[loss=0.06361, simple_loss=0.08105, pruned_loss=0.01178, audio_tagging_loss=0.0113, over 14504.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09038, pruned_loss=0.01257, audio_tagging_loss=0.008812, over 3041384.41 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:26:07,841 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476450 2023-11-26 02:26:08,060 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:26:26,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3176386.6666666665, ans=0.1 2023-11-26 02:26:41,532 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7550, loss[loss=0.07103, simple_loss=0.09849, pruned_loss=0.01339, audio_tagging_loss=0.008398, over 16996.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09042, pruned_loss=0.01254, audio_tagging_loss=0.008768, over 3049927.07 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:26:57,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.0 2023-11-26 02:27:03,125 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476500 2023-11-26 02:27:15,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3176720.0, ans=0.0 2023-11-26 02:27:18,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3176720.0, ans=0.0 2023-11-26 02:27:19,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3176720.0, ans=0.125 2023-11-26 02:27:26,801 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.676e+01 8.615e+01 8.990e+01 9.647e+01 1.278e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-26 02:27:36,400 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7600, loss[loss=0.05709, simple_loss=0.07274, pruned_loss=0.009113, audio_tagging_loss=0.0116, over 16016.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.0897, pruned_loss=0.0125, audio_tagging_loss=0.008729, over 3058383.40 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:27:43,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3176853.3333333335, ans=0.125 2023-11-26 02:27:45,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3176853.3333333335, ans=0.125 2023-11-26 02:27:59,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476550 2023-11-26 02:27:59,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3176986.6666666665, ans=0.125 2023-11-26 02:27:59,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3176986.6666666665, ans=0.125 2023-11-26 02:28:09,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3177053.3333333335, ans=0.0 2023-11-26 02:28:10,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3177053.3333333335, ans=0.0 2023-11-26 02:28:13,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3177053.3333333335, ans=0.125 2023-11-26 02:28:23,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3177120.0, ans=0.125 2023-11-26 02:28:31,835 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7650, loss[loss=0.05707, simple_loss=0.07482, pruned_loss=0.01131, audio_tagging_loss=0.008358, over 14335.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09054, pruned_loss=0.01259, audio_tagging_loss=0.008656, over 3051452.08 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:28:39,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3177186.6666666665, ans=0.2 2023-11-26 02:28:51,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3177253.3333333335, ans=0.125 2023-11-26 02:28:54,275 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476600 2023-11-26 02:29:18,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.582e+01 9.244e+01 1.012e+02 1.285e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 02:29:23,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3177453.3333333335, ans=0.0 2023-11-26 02:29:28,319 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7700, loss[loss=0.06081, simple_loss=0.07436, pruned_loss=0.01134, audio_tagging_loss=0.01228, over 14920.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09034, pruned_loss=0.01268, audio_tagging_loss=0.008703, over 3052602.18 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:29:37,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3177520.0, ans=0.05 2023-11-26 02:29:50,125 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476650 2023-11-26 02:30:01,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3177720.0, ans=0.0 2023-11-26 02:30:07,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3177720.0, ans=0.0 2023-11-26 02:30:11,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3177786.6666666665, ans=0.125 2023-11-26 02:30:11,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3177786.6666666665, ans=0.05 2023-11-26 02:30:13,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3177786.6666666665, ans=0.125 2023-11-26 02:30:23,202 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7750, loss[loss=0.06707, simple_loss=0.09067, pruned_loss=0.01316, audio_tagging_loss=0.008576, over 15474.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09116, pruned_loss=0.01283, audio_tagging_loss=0.008814, over 3051664.19 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:30:45,805 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476700 2023-11-26 02:31:04,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3178053.3333333335, ans=0.0 2023-11-26 02:31:07,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.684e+01 9.280e+01 1.001e+02 1.211e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 02:31:17,909 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7800, loss[loss=0.07195, simple_loss=0.09553, pruned_loss=0.01328, audio_tagging_loss=0.0109, over 16389.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09123, pruned_loss=0.01266, audio_tagging_loss=0.008807, over 3053312.81 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:31:19,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3178186.6666666665, ans=0.125 2023-11-26 02:31:40,644 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476750 2023-11-26 02:31:44,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3178320.0, ans=0.1 2023-11-26 02:31:45,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3178320.0, ans=0.0 2023-11-26 02:32:14,218 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7850, loss[loss=0.06161, simple_loss=0.0805, pruned_loss=0.0117, audio_tagging_loss=0.009664, over 14592.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09019, pruned_loss=0.01248, audio_tagging_loss=0.008937, over 3053044.79 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:32:14,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3178520.0, ans=0.125 2023-11-26 02:32:22,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3178520.0, ans=0.0 2023-11-26 02:32:35,296 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476800 2023-11-26 02:32:40,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3178653.3333333335, ans=0.125 2023-11-26 02:32:46,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3178720.0, ans=0.125 2023-11-26 02:32:48,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3178720.0, ans=0.0 2023-11-26 02:32:59,765 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.601e+01 9.556e+01 1.008e+02 1.371e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 02:33:09,234 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7900, loss[loss=0.07052, simple_loss=0.1019, pruned_loss=0.01087, audio_tagging_loss=0.0087, over 15450.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.0918, pruned_loss=0.01296, audio_tagging_loss=0.008987, over 3056910.75 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:33:15,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3178853.3333333335, ans=0.125 2023-11-26 02:33:20,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3178920.0, ans=0.0 2023-11-26 02:33:31,533 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476850 2023-11-26 02:33:40,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3178986.6666666665, ans=0.125 2023-11-26 02:33:56,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3179120.0, ans=0.0 2023-11-26 02:34:02,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-26 02:34:04,816 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7950, loss[loss=0.06871, simple_loss=0.1001, pruned_loss=0.01249, audio_tagging_loss=0.006166, over 14452.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09182, pruned_loss=0.01302, audio_tagging_loss=0.008987, over 3048617.72 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:34:19,520 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:34:19,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3179253.3333333335, ans=0.125 2023-11-26 02:34:27,425 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476900 2023-11-26 02:34:40,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3179386.6666666665, ans=0.0 2023-11-26 02:34:47,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3179386.6666666665, ans=0.0 2023-11-26 02:34:50,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.906e+01 9.549e+01 1.026e+02 1.284e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 02:34:55,600 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.68 vs. limit=5.0 2023-11-26 02:34:57,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3179453.3333333335, ans=0.125 2023-11-26 02:34:59,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.09 vs. limit=22.5 2023-11-26 02:35:00,598 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8000, loss[loss=0.06335, simple_loss=0.08719, pruned_loss=0.01007, audio_tagging_loss=0.00969, over 15841.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.08994, pruned_loss=0.01273, audio_tagging_loss=0.009142, over 3050277.35 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:35:20,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3179586.6666666665, ans=0.2 2023-11-26 02:35:22,366 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476950 2023-11-26 02:35:26,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2023-11-26 02:35:35,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3179720.0, ans=0.0 2023-11-26 02:35:38,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3179720.0, ans=0.1 2023-11-26 02:35:39,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3179720.0, ans=0.125 2023-11-26 02:35:56,033 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8050, loss[loss=0.06967, simple_loss=0.09951, pruned_loss=0.01259, audio_tagging_loss=0.007321, over 15458.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.08961, pruned_loss=0.01272, audio_tagging_loss=0.009215, over 3048842.14 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:36:05,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3179920.0, ans=0.0 2023-11-26 02:36:13,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=22.5 2023-11-26 02:36:14,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2023-11-26 02:36:18,249 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477000 2023-11-26 02:36:25,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2023-11-26 02:36:27,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3179986.6666666665, ans=0.0 2023-11-26 02:36:27,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3179986.6666666665, ans=10.0 2023-11-26 02:36:41,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.827e+01 9.529e+01 1.030e+02 1.385e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 02:36:44,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3180120.0, ans=0.0 2023-11-26 02:36:51,849 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8100, loss[loss=0.08287, simple_loss=0.1164, pruned_loss=0.01484, audio_tagging_loss=0.009843, over 15452.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09036, pruned_loss=0.01288, audio_tagging_loss=0.009045, over 3049019.41 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:37:02,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3180253.3333333335, ans=0.125 2023-11-26 02:37:04,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3180253.3333333335, ans=0.125 2023-11-26 02:37:05,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3180253.3333333335, ans=0.05 2023-11-26 02:37:14,093 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477050 2023-11-26 02:37:31,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2023-11-26 02:37:45,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.01 vs. limit=10.0 2023-11-26 02:37:47,791 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8150, loss[loss=0.06598, simple_loss=0.09427, pruned_loss=0.01145, audio_tagging_loss=0.007402, over 15514.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09119, pruned_loss=0.01303, audio_tagging_loss=0.008856, over 3052323.01 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:38:05,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.50 vs. limit=10.0 2023-11-26 02:38:08,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2023-11-26 02:38:09,402 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477100 2023-11-26 02:38:33,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.566e+01 9.339e+01 1.007e+02 1.243e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 02:38:41,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2023-11-26 02:38:43,160 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8200, loss[loss=0.06265, simple_loss=0.08595, pruned_loss=0.01077, audio_tagging_loss=0.008894, over 14991.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09076, pruned_loss=0.01293, audio_tagging_loss=0.008783, over 3049811.87 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:38:45,259 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:38:55,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3180920.0, ans=0.0 2023-11-26 02:39:04,739 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477150 2023-11-26 02:39:29,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3181120.0, ans=0.2 2023-11-26 02:39:33,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3181120.0, ans=0.1 2023-11-26 02:39:38,052 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8250, loss[loss=0.06797, simple_loss=0.09031, pruned_loss=0.013, audio_tagging_loss=0.009814, over 15545.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.0907, pruned_loss=0.01286, audio_tagging_loss=0.008866, over 3045498.85 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:40:00,632 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477200 2023-11-26 02:40:07,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-26 02:40:23,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.407e+01 9.078e+01 9.588e+01 1.625e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 02:40:24,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3181453.3333333335, ans=0.125 2023-11-26 02:40:34,004 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8300, loss[loss=0.06479, simple_loss=0.08846, pruned_loss=0.01152, audio_tagging_loss=0.009032, over 15720.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09067, pruned_loss=0.01288, audio_tagging_loss=0.008839, over 3047459.97 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:40:47,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3181586.6666666665, ans=0.125 2023-11-26 02:40:56,322 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477250 2023-11-26 02:41:11,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 02:41:27,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3181786.6666666665, ans=0.0 2023-11-26 02:41:29,719 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8350, loss[loss=0.06058, simple_loss=0.07997, pruned_loss=0.01297, audio_tagging_loss=0.007632, over 15374.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09052, pruned_loss=0.01282, audio_tagging_loss=0.008801, over 3048998.24 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:41:33,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3181853.3333333335, ans=0.125 2023-11-26 02:41:38,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3181853.3333333335, ans=0.0 2023-11-26 02:41:41,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-26 02:41:47,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3181920.0, ans=0.1 2023-11-26 02:41:52,064 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477300 2023-11-26 02:41:55,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3181986.6666666665, ans=0.125 2023-11-26 02:42:01,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3181986.6666666665, ans=0.125 2023-11-26 02:42:16,198 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.717e+01 9.531e+01 1.032e+02 1.340e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 02:42:21,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3182120.0, ans=0.125 2023-11-26 02:42:25,228 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8400, loss[loss=0.08151, simple_loss=0.1065, pruned_loss=0.01955, audio_tagging_loss=0.008735, over 15133.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09009, pruned_loss=0.01275, audio_tagging_loss=0.008898, over 3052004.43 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:42:27,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3182186.6666666665, ans=0.0 2023-11-26 02:42:33,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3182186.6666666665, ans=0.125 2023-11-26 02:42:47,842 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477350 2023-11-26 02:42:48,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3182320.0, ans=0.125 2023-11-26 02:42:57,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=12.0 2023-11-26 02:43:11,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3182453.3333333335, ans=0.1 2023-11-26 02:43:14,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3182453.3333333335, ans=0.0 2023-11-26 02:43:20,961 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8450, loss[loss=0.08591, simple_loss=0.1159, pruned_loss=0.01819, audio_tagging_loss=0.009786, over 14697.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08999, pruned_loss=0.01268, audio_tagging_loss=0.008878, over 3054675.96 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:43:29,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2023-11-26 02:43:32,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3182586.6666666665, ans=0.0 2023-11-26 02:43:40,036 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.78 vs. limit=10.0 2023-11-26 02:43:40,129 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-26 02:43:42,523 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477400 2023-11-26 02:43:43,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3182653.3333333335, ans=0.125 2023-11-26 02:43:56,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3182720.0, ans=0.125 2023-11-26 02:44:09,229 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.762e+01 9.219e+01 9.767e+01 1.234e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 02:44:16,646 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8500, loss[loss=0.06367, simple_loss=0.08226, pruned_loss=0.0114, audio_tagging_loss=0.01114, over 14782.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.08995, pruned_loss=0.01271, audio_tagging_loss=0.008904, over 3056594.26 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:44:17,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3182853.3333333335, ans=0.125 2023-11-26 02:44:39,009 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477450 2023-11-26 02:44:41,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3182986.6666666665, ans=0.125 2023-11-26 02:44:42,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3182986.6666666665, ans=0.0 2023-11-26 02:44:42,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3182986.6666666665, ans=0.2 2023-11-26 02:44:49,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3183053.3333333335, ans=0.125 2023-11-26 02:44:51,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3183053.3333333335, ans=0.125 2023-11-26 02:44:57,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-26 02:45:03,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3183120.0, ans=0.125 2023-11-26 02:45:11,606 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8550, loss[loss=0.0664, simple_loss=0.08744, pruned_loss=0.01393, audio_tagging_loss=0.008755, over 16821.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09034, pruned_loss=0.01272, audio_tagging_loss=0.008919, over 3063237.47 frames. ], batch size: 65, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:45:22,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3183253.3333333335, ans=0.05 2023-11-26 02:45:24,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3183253.3333333335, ans=0.0 2023-11-26 02:45:34,871 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477500 2023-11-26 02:45:36,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3183320.0, ans=0.95 2023-11-26 02:45:44,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3183386.6666666665, ans=0.2 2023-11-26 02:45:46,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-11-26 02:45:48,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3183386.6666666665, ans=0.0 2023-11-26 02:45:59,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2023-11-26 02:45:59,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.847e+01 9.604e+01 1.040e+02 3.358e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-26 02:46:03,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3183453.3333333335, ans=0.0 2023-11-26 02:46:07,456 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8600, loss[loss=0.07362, simple_loss=0.08973, pruned_loss=0.01485, audio_tagging_loss=0.0139, over 14020.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09133, pruned_loss=0.01276, audio_tagging_loss=0.008904, over 3065776.28 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:46:15,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-26 02:46:22,352 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:46:29,610 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477550 2023-11-26 02:46:40,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3183720.0, ans=0.125 2023-11-26 02:46:54,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-26 02:46:55,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3183786.6666666665, ans=0.125 2023-11-26 02:47:01,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3183786.6666666665, ans=0.125 2023-11-26 02:47:03,345 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8650, loss[loss=0.07335, simple_loss=0.1026, pruned_loss=0.0151, audio_tagging_loss=0.006918, over 14485.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09186, pruned_loss=0.01282, audio_tagging_loss=0.008979, over 3063685.33 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:47:10,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3183853.3333333335, ans=0.0 2023-11-26 02:47:17,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3183920.0, ans=0.1 2023-11-26 02:47:18,277 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:47:19,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3183920.0, ans=0.125 2023-11-26 02:47:21,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3183920.0, ans=0.1 2023-11-26 02:47:24,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477600 2023-11-26 02:47:50,490 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.714e+01 9.384e+01 1.008e+02 1.495e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 02:47:51,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3184120.0, ans=0.1 2023-11-26 02:47:55,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3184120.0, ans=0.125 2023-11-26 02:47:57,877 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8700, loss[loss=0.06748, simple_loss=0.09101, pruned_loss=0.01298, audio_tagging_loss=0.008991, over 14399.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09181, pruned_loss=0.0127, audio_tagging_loss=0.00898, over 3064720.25 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:48:00,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-26 02:48:08,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3184253.3333333335, ans=0.125 2023-11-26 02:48:21,286 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477650 2023-11-26 02:48:33,019 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:48:47,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-11-26 02:48:53,989 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8750, loss[loss=0.082, simple_loss=0.1183, pruned_loss=0.01748, audio_tagging_loss=0.005385, over 15987.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.0919, pruned_loss=0.01279, audio_tagging_loss=0.009019, over 3064981.51 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:49:10,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3184586.6666666665, ans=0.125 2023-11-26 02:49:16,031 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477700 2023-11-26 02:49:21,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3184653.3333333335, ans=0.125 2023-11-26 02:49:26,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3184720.0, ans=0.2 2023-11-26 02:49:27,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3184720.0, ans=0.125 2023-11-26 02:49:33,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3184720.0, ans=0.1 2023-11-26 02:49:40,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3184786.6666666665, ans=0.1 2023-11-26 02:49:41,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.883e+01 9.536e+01 1.029e+02 1.726e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 02:49:49,889 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8800, loss[loss=0.07804, simple_loss=0.09567, pruned_loss=0.01918, audio_tagging_loss=0.01102, over 14295.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09192, pruned_loss=0.01279, audio_tagging_loss=0.009101, over 3066757.92 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:49:53,322 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:50:03,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3184920.0, ans=0.1 2023-11-26 02:50:11,228 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477750 2023-11-26 02:50:18,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3184986.6666666665, ans=0.0 2023-11-26 02:50:25,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3185053.3333333335, ans=0.125 2023-11-26 02:50:34,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3185120.0, ans=0.1 2023-11-26 02:50:34,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3185120.0, ans=0.125 2023-11-26 02:50:44,757 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8850, loss[loss=0.05524, simple_loss=0.07541, pruned_loss=0.01045, audio_tagging_loss=0.007095, over 14208.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.0908, pruned_loss=0.01248, audio_tagging_loss=0.009167, over 3061735.33 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:50:49,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3185186.6666666665, ans=0.125 2023-11-26 02:50:54,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2023-11-26 02:50:56,856 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:50:58,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3185253.3333333335, ans=0.125 2023-11-26 02:51:01,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3185253.3333333335, ans=0.125 2023-11-26 02:51:06,926 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477800 2023-11-26 02:51:11,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3185320.0, ans=0.125 2023-11-26 02:51:17,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2023-11-26 02:51:22,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-26 02:51:31,906 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.669e+01 9.316e+01 9.941e+01 1.332e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 02:51:39,820 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8900, loss[loss=0.06985, simple_loss=0.09721, pruned_loss=0.01498, audio_tagging_loss=0.006266, over 14405.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09055, pruned_loss=0.01238, audio_tagging_loss=0.009051, over 3067509.27 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:51:49,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3185520.0, ans=0.0 2023-11-26 02:52:02,493 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477850 2023-11-26 02:52:22,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3185720.0, ans=0.0 2023-11-26 02:52:22,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3185720.0, ans=0.2 2023-11-26 02:52:36,337 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8950, loss[loss=0.05742, simple_loss=0.07236, pruned_loss=0.009916, audio_tagging_loss=0.01133, over 14938.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09125, pruned_loss=0.01248, audio_tagging_loss=0.008818, over 3063432.43 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:52:36,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3185853.3333333335, ans=0.125 2023-11-26 02:52:45,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.62 vs. limit=10.0 2023-11-26 02:52:47,023 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:52:57,349 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477900 2023-11-26 02:53:20,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3186120.0, ans=0.125 2023-11-26 02:53:21,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3186120.0, ans=0.125 2023-11-26 02:53:23,900 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.837e+01 9.264e+01 1.015e+02 1.330e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 02:53:31,279 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9000, loss[loss=0.06472, simple_loss=0.08924, pruned_loss=0.01483, audio_tagging_loss=0.005274, over 14696.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09178, pruned_loss=0.01263, audio_tagging_loss=0.008705, over 3064430.41 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:53:31,280 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 02:54:03,245 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.05846, simple_loss=0.05059, pruned_loss=0.005121, audio_tagging_loss=0.02804, over 4681554.00 frames. 2023-11-26 02:54:03,246 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 02:54:18,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3186253.3333333335, ans=0.05 2023-11-26 02:54:25,301 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477950 2023-11-26 02:54:44,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3186386.6666666665, ans=0.1 2023-11-26 02:54:50,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-26 02:54:53,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3186453.3333333335, ans=0.125 2023-11-26 02:54:59,522 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9050, loss[loss=0.06503, simple_loss=0.08647, pruned_loss=0.01283, audio_tagging_loss=0.008955, over 15303.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09134, pruned_loss=0.01257, audio_tagging_loss=0.008668, over 3063933.90 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:55:05,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3186520.0, ans=0.2 2023-11-26 02:55:11,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2023-11-26 02:55:20,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478000 2023-11-26 02:55:38,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3186720.0, ans=0.0 2023-11-26 02:55:48,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.738e+01 9.239e+01 1.027e+02 1.338e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 02:55:53,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3186786.6666666665, ans=0.2 2023-11-26 02:55:54,935 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9100, loss[loss=0.04562, simple_loss=0.0534, pruned_loss=0.008791, audio_tagging_loss=0.01013, over 14635.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09118, pruned_loss=0.01256, audio_tagging_loss=0.008712, over 3054759.68 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:56:00,545 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:56:17,377 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478050 2023-11-26 02:56:28,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3187053.3333333335, ans=0.125 2023-11-26 02:56:30,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3187053.3333333335, ans=0.125 2023-11-26 02:56:50,661 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9150, loss[loss=0.05338, simple_loss=0.06976, pruned_loss=0.00875, audio_tagging_loss=0.009746, over 15168.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09028, pruned_loss=0.01249, audio_tagging_loss=0.008737, over 3047773.01 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 02:56:56,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3187186.6666666665, ans=0.125 2023-11-26 02:57:13,548 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478100 2023-11-26 02:57:34,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3187453.3333333335, ans=0.125 2023-11-26 02:57:39,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.630e+01 9.008e+01 9.650e+01 1.509e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-26 02:57:46,737 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9200, loss[loss=0.06902, simple_loss=0.09169, pruned_loss=0.01459, audio_tagging_loss=0.008582, over 15085.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09029, pruned_loss=0.01247, audio_tagging_loss=0.008734, over 3051328.01 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:58:08,646 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478150 2023-11-26 02:58:29,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187720.0, ans=0.1 2023-11-26 02:58:42,637 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9250, loss[loss=0.08596, simple_loss=0.1185, pruned_loss=0.01746, audio_tagging_loss=0.009224, over 15192.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09029, pruned_loss=0.01249, audio_tagging_loss=0.008775, over 3050501.30 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:58:45,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3187853.3333333335, ans=0.04949747468305833 2023-11-26 02:58:56,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2023-11-26 02:59:04,972 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478200 2023-11-26 02:59:13,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3187986.6666666665, ans=0.125 2023-11-26 02:59:25,544 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:59:26,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3188120.0, ans=0.125 2023-11-26 02:59:26,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3188120.0, ans=0.125 2023-11-26 02:59:31,489 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.730e+01 8.687e+01 9.467e+01 1.008e+02 1.287e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 02:59:32,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3188120.0, ans=0.1 2023-11-26 02:59:38,523 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9300, loss[loss=0.09187, simple_loss=0.1377, pruned_loss=0.01806, audio_tagging_loss=0.004973, over 16135.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09014, pruned_loss=0.01245, audio_tagging_loss=0.008743, over 3054386.79 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:59:39,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3188186.6666666665, ans=0.125 2023-11-26 02:59:41,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3188186.6666666665, ans=0.0 2023-11-26 02:59:59,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3188320.0, ans=0.0 2023-11-26 03:00:00,851 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478250 2023-11-26 03:00:04,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3188320.0, ans=0.07 2023-11-26 03:00:14,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3188386.6666666665, ans=0.0 2023-11-26 03:00:18,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3188386.6666666665, ans=0.0 2023-11-26 03:00:23,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3188453.3333333335, ans=0.2 2023-11-26 03:00:34,955 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9350, loss[loss=0.05812, simple_loss=0.07218, pruned_loss=0.01129, audio_tagging_loss=0.01074, over 14589.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08969, pruned_loss=0.01248, audio_tagging_loss=0.008846, over 3046984.09 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:00:35,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3188520.0, ans=0.125 2023-11-26 03:00:56,902 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478300 2023-11-26 03:01:00,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-26 03:01:23,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.528e+01 9.262e+01 1.004e+02 1.387e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 03:01:29,742 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-26 03:01:30,309 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9400, loss[loss=0.07433, simple_loss=0.1064, pruned_loss=0.01396, audio_tagging_loss=0.007182, over 15732.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09004, pruned_loss=0.01248, audio_tagging_loss=0.008906, over 3044724.44 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:01:37,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-26 03:01:38,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3188853.3333333335, ans=0.125 2023-11-26 03:01:40,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3188920.0, ans=0.1 2023-11-26 03:01:52,674 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478350 2023-11-26 03:02:23,635 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2023-11-26 03:02:24,987 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:02:26,070 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9450, loss[loss=0.07481, simple_loss=0.107, pruned_loss=0.01405, audio_tagging_loss=0.007269, over 15301.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08947, pruned_loss=0.01238, audio_tagging_loss=0.008962, over 3050346.09 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:02:31,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3189186.6666666665, ans=0.125 2023-11-26 03:02:42,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3189253.3333333335, ans=0.0 2023-11-26 03:02:48,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3189320.0, ans=0.125 2023-11-26 03:02:49,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478400 2023-11-26 03:02:52,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3189320.0, ans=0.125 2023-11-26 03:03:02,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3189386.6666666665, ans=0.125 2023-11-26 03:03:16,854 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.757e+01 9.306e+01 1.009e+02 1.204e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 03:03:22,604 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9500, loss[loss=0.06264, simple_loss=0.09042, pruned_loss=0.01035, audio_tagging_loss=0.007081, over 16209.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08974, pruned_loss=0.01246, audio_tagging_loss=0.009002, over 3047073.14 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:03:23,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3189520.0, ans=0.0 2023-11-26 03:03:37,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3189586.6666666665, ans=0.125 2023-11-26 03:03:43,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3189586.6666666665, ans=0.0 2023-11-26 03:03:45,036 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478450 2023-11-26 03:04:15,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3189786.6666666665, ans=0.0 2023-11-26 03:04:18,286 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9550, loss[loss=0.05338, simple_loss=0.06275, pruned_loss=0.01065, audio_tagging_loss=0.01136, over 14086.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08924, pruned_loss=0.01247, audio_tagging_loss=0.009045, over 3042631.02 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:04:19,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3189853.3333333335, ans=0.1 2023-11-26 03:04:40,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=12.0 2023-11-26 03:04:40,860 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478500 2023-11-26 03:04:43,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.88 vs. limit=10.0 2023-11-26 03:04:45,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3189986.6666666665, ans=0.0 2023-11-26 03:04:54,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3190053.3333333335, ans=0.125 2023-11-26 03:04:57,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3190053.3333333335, ans=0.2 2023-11-26 03:05:08,240 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.706e+01 9.165e+01 9.979e+01 1.359e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 03:05:14,051 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9600, loss[loss=0.06814, simple_loss=0.08463, pruned_loss=0.01603, audio_tagging_loss=0.009798, over 15090.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08877, pruned_loss=0.01234, audio_tagging_loss=0.009092, over 3042546.97 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:05:28,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=8.0 2023-11-26 03:05:37,114 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478550 2023-11-26 03:05:57,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3190453.3333333335, ans=0.125 2023-11-26 03:06:01,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3190453.3333333335, ans=0.125 2023-11-26 03:06:06,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3190453.3333333335, ans=0.1 2023-11-26 03:06:10,227 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9650, loss[loss=0.07946, simple_loss=0.111, pruned_loss=0.01557, audio_tagging_loss=0.008396, over 15256.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08926, pruned_loss=0.01251, audio_tagging_loss=0.008999, over 3038830.29 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:06:11,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.47 vs. limit=6.0 2023-11-26 03:06:12,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3190520.0, ans=0.125 2023-11-26 03:06:31,817 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478600 2023-11-26 03:06:38,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3190653.3333333335, ans=0.125 2023-11-26 03:06:41,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3190653.3333333335, ans=0.125 2023-11-26 03:06:50,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=15.0 2023-11-26 03:07:00,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.580e+01 9.247e+01 1.001e+02 1.210e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 03:07:06,033 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9700, loss[loss=0.0938, simple_loss=0.1323, pruned_loss=0.01923, audio_tagging_loss=0.008423, over 15404.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09005, pruned_loss=0.01257, audio_tagging_loss=0.008794, over 3040829.61 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:07:15,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3190920.0, ans=0.2 2023-11-26 03:07:26,601 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-26 03:07:28,289 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478650 2023-11-26 03:07:55,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3191120.0, ans=0.0 2023-11-26 03:08:01,016 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9750, loss[loss=0.08033, simple_loss=0.1123, pruned_loss=0.01604, audio_tagging_loss=0.008145, over 14273.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09056, pruned_loss=0.01263, audio_tagging_loss=0.008736, over 3046033.60 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:08:02,560 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2023-11-26 03:08:03,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3191186.6666666665, ans=0.0 2023-11-26 03:08:13,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3191253.3333333335, ans=0.0 2023-11-26 03:08:18,741 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:08:22,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3191253.3333333335, ans=0.125 2023-11-26 03:08:24,578 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478700 2023-11-26 03:08:25,177 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-26 03:08:40,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3191386.6666666665, ans=0.0 2023-11-26 03:08:42,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3191386.6666666665, ans=0.125 2023-11-26 03:08:45,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3191453.3333333335, ans=0.0 2023-11-26 03:08:52,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.589e+01 9.136e+01 9.841e+01 1.317e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 03:08:57,335 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9800, loss[loss=0.05791, simple_loss=0.08404, pruned_loss=0.007772, audio_tagging_loss=0.008119, over 15461.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09126, pruned_loss=0.0128, audio_tagging_loss=0.008691, over 3047279.96 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:09:01,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3191520.0, ans=0.1 2023-11-26 03:09:02,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-26 03:09:03,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3191520.0, ans=0.0 2023-11-26 03:09:19,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478750 2023-11-26 03:09:30,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=12.0 2023-11-26 03:09:37,202 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2023-11-26 03:09:37,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3191720.0, ans=0.125 2023-11-26 03:09:48,435 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:09:53,711 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9850, loss[loss=0.08019, simple_loss=0.1184, pruned_loss=0.01462, audio_tagging_loss=0.006392, over 15693.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09202, pruned_loss=0.01292, audio_tagging_loss=0.008555, over 3046295.26 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:10:14,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478800 2023-11-26 03:10:43,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2023-11-26 03:10:44,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 8.656e+01 9.156e+01 9.841e+01 1.408e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-26 03:10:44,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3192120.0, ans=0.2 2023-11-26 03:10:48,796 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9900, loss[loss=0.05523, simple_loss=0.06983, pruned_loss=0.01134, audio_tagging_loss=0.00898, over 15047.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09188, pruned_loss=0.01301, audio_tagging_loss=0.0086, over 3047786.84 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:10:51,102 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:11:11,606 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478850 2023-11-26 03:11:11,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3192320.0, ans=0.125 2023-11-26 03:11:30,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2023-11-26 03:11:39,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3192453.3333333335, ans=0.125 2023-11-26 03:11:43,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-11-26 03:11:44,257 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9950, loss[loss=0.05123, simple_loss=0.07764, pruned_loss=0.005374, audio_tagging_loss=0.007041, over 16269.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09196, pruned_loss=0.01297, audio_tagging_loss=0.008583, over 3054438.34 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:11:54,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3192520.0, ans=0.125 2023-11-26 03:11:56,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3192586.6666666665, ans=0.0 2023-11-26 03:12:06,724 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478900 2023-11-26 03:12:18,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3192720.0, ans=0.125 2023-11-26 03:12:31,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3192786.6666666665, ans=0.025 2023-11-26 03:12:36,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.464e+01 9.280e+01 9.880e+01 1.249e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 03:12:37,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3192786.6666666665, ans=0.125 2023-11-26 03:12:40,793 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10000, loss[loss=0.07268, simple_loss=0.106, pruned_loss=0.01214, audio_tagging_loss=0.007558, over 15433.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09111, pruned_loss=0.01274, audio_tagging_loss=0.008594, over 3051399.75 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:12:52,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3192920.0, ans=0.1 2023-11-26 03:12:55,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2023-11-26 03:13:01,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3192986.6666666665, ans=0.125 2023-11-26 03:13:02,286 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478950 2023-11-26 03:13:04,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3192986.6666666665, ans=0.0 2023-11-26 03:13:17,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3193053.3333333335, ans=0.1 2023-11-26 03:13:21,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3193053.3333333335, ans=0.0 2023-11-26 03:13:36,314 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10050, loss[loss=0.06366, simple_loss=0.08274, pruned_loss=0.01235, audio_tagging_loss=0.009939, over 15334.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.0909, pruned_loss=0.01271, audio_tagging_loss=0.00865, over 3050497.91 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:13:38,593 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:13:58,678 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479000 2023-11-26 03:14:20,465 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:14:26,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.389e+01 9.159e+01 9.802e+01 1.976e+02, threshold=1.832e+02, percent-clipped=1.0 2023-11-26 03:14:31,505 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10100, loss[loss=0.05764, simple_loss=0.06905, pruned_loss=0.01081, audio_tagging_loss=0.0123, over 14856.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09078, pruned_loss=0.01275, audio_tagging_loss=0.00875, over 3046830.09 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:14:38,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3193520.0, ans=0.2 2023-11-26 03:14:53,776 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479050 2023-11-26 03:14:57,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2023-11-26 03:15:07,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3193720.0, ans=0.05 2023-11-26 03:15:09,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3193720.0, ans=0.2 2023-11-26 03:15:16,530 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:15:20,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2023-11-26 03:15:22,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-26 03:15:27,578 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10150, loss[loss=0.05149, simple_loss=0.07522, pruned_loss=0.007195, audio_tagging_loss=0.006689, over 15812.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09059, pruned_loss=0.01265, audio_tagging_loss=0.008846, over 3046106.39 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:15:27,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3193853.3333333335, ans=0.0 2023-11-26 03:15:37,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3193920.0, ans=0.125 2023-11-26 03:15:49,036 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479100 2023-11-26 03:15:53,164 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:16:10,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3194053.3333333335, ans=0.125 2023-11-26 03:16:12,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3194120.0, ans=0.0 2023-11-26 03:16:14,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3194120.0, ans=10.0 2023-11-26 03:16:19,111 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-26 03:16:19,620 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.867e+01 9.466e+01 1.017e+02 1.236e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 03:16:22,517 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2023-11-26 03:16:22,898 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10200, loss[loss=0.07402, simple_loss=0.1011, pruned_loss=0.01336, audio_tagging_loss=0.0101, over 14790.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.0908, pruned_loss=0.01281, audio_tagging_loss=0.008925, over 3043538.50 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:16:31,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3194186.6666666665, ans=0.0 2023-11-26 03:16:38,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3194253.3333333335, ans=0.125 2023-11-26 03:16:42,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3194253.3333333335, ans=0.035 2023-11-26 03:16:45,242 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:16:45,305 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479150 2023-11-26 03:16:51,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3194320.0, ans=0.125 2023-11-26 03:16:52,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3194320.0, ans=0.125 2023-11-26 03:17:00,626 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:17:00,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3194386.6666666665, ans=0.04949747468305833 2023-11-26 03:17:17,606 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10250, loss[loss=0.08259, simple_loss=0.1186, pruned_loss=0.01726, audio_tagging_loss=0.006033, over 15525.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09166, pruned_loss=0.01288, audio_tagging_loss=0.008855, over 3047174.46 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:17:23,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3194520.0, ans=0.1 2023-11-26 03:17:29,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3194586.6666666665, ans=0.1 2023-11-26 03:17:41,081 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479200 2023-11-26 03:17:47,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3194653.3333333335, ans=0.1 2023-11-26 03:17:48,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3194653.3333333335, ans=0.0 2023-11-26 03:18:03,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3194786.6666666665, ans=0.025 2023-11-26 03:18:03,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3194786.6666666665, ans=0.125 2023-11-26 03:18:05,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-26 03:18:10,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.528e+01 9.326e+01 1.007e+02 1.324e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 03:18:14,616 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10300, loss[loss=0.06378, simple_loss=0.08292, pruned_loss=0.01316, audio_tagging_loss=0.009156, over 15240.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09218, pruned_loss=0.0129, audio_tagging_loss=0.008842, over 3048437.71 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:18:21,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3194853.3333333335, ans=0.125 2023-11-26 03:18:29,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3194920.0, ans=0.2 2023-11-26 03:18:36,420 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479250 2023-11-26 03:18:36,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3194986.6666666665, ans=0.125 2023-11-26 03:18:47,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3195053.3333333335, ans=0.2 2023-11-26 03:19:04,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3195120.0, ans=0.125 2023-11-26 03:19:04,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3195120.0, ans=0.0 2023-11-26 03:19:10,598 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10350, loss[loss=0.0783, simple_loss=0.1008, pruned_loss=0.01651, audio_tagging_loss=0.01139, over 15151.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09143, pruned_loss=0.01278, audio_tagging_loss=0.009033, over 3045318.62 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:19:15,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3195186.6666666665, ans=0.125 2023-11-26 03:19:32,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479300 2023-11-26 03:19:42,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-26 03:19:48,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3195386.6666666665, ans=0.2 2023-11-26 03:19:50,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3195386.6666666665, ans=0.1 2023-11-26 03:20:03,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.860e+01 9.371e+01 1.044e+02 1.411e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 03:20:06,035 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10400, loss[loss=0.07638, simple_loss=0.108, pruned_loss=0.01437, audio_tagging_loss=0.008028, over 14627.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09131, pruned_loss=0.01279, audio_tagging_loss=0.009069, over 3041213.62 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:20:16,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3195586.6666666665, ans=0.07 2023-11-26 03:20:29,529 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479350 2023-11-26 03:20:43,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3195720.0, ans=0.125 2023-11-26 03:20:52,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3195786.6666666665, ans=0.125 2023-11-26 03:21:00,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3195786.6666666665, ans=0.0 2023-11-26 03:21:02,551 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10450, loss[loss=0.07093, simple_loss=0.09128, pruned_loss=0.01298, audio_tagging_loss=0.01231, over 15260.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09044, pruned_loss=0.01286, audio_tagging_loss=0.009114, over 3035881.81 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:21:04,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3195853.3333333335, ans=0.2 2023-11-26 03:21:09,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3195853.3333333335, ans=0.025 2023-11-26 03:21:25,166 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479400 2023-11-26 03:21:28,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3195986.6666666665, ans=0.0 2023-11-26 03:21:33,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3195986.6666666665, ans=0.125 2023-11-26 03:21:46,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-26 03:21:56,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.740e+01 9.314e+01 1.011e+02 1.304e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 03:21:59,373 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10500, loss[loss=0.05332, simple_loss=0.06437, pruned_loss=0.00844, audio_tagging_loss=0.0127, over 15357.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.08995, pruned_loss=0.01285, audio_tagging_loss=0.009024, over 3040223.77 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:22:10,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-11-26 03:22:18,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3196253.3333333335, ans=0.125 2023-11-26 03:22:18,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3196253.3333333335, ans=0.1 2023-11-26 03:22:21,260 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479450 2023-11-26 03:22:23,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3196320.0, ans=0.125 2023-11-26 03:22:33,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3196386.6666666665, ans=0.09899494936611666 2023-11-26 03:22:35,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3196386.6666666665, ans=0.125 2023-11-26 03:22:46,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3196453.3333333335, ans=0.0 2023-11-26 03:22:54,781 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10550, loss[loss=0.079, simple_loss=0.1043, pruned_loss=0.01705, audio_tagging_loss=0.009807, over 14577.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09067, pruned_loss=0.01281, audio_tagging_loss=0.008876, over 3041691.98 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:22:57,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3196520.0, ans=0.0 2023-11-26 03:22:58,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3196520.0, ans=0.0 2023-11-26 03:22:59,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3196520.0, ans=0.05 2023-11-26 03:23:00,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3196520.0, ans=0.125 2023-11-26 03:23:02,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3196520.0, ans=0.125 2023-11-26 03:23:09,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-11-26 03:23:17,866 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479500 2023-11-26 03:23:29,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3196720.0, ans=0.125 2023-11-26 03:23:48,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.656e+01 9.441e+01 1.040e+02 1.486e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 03:23:49,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3196853.3333333335, ans=0.1 2023-11-26 03:23:50,620 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10600, loss[loss=0.03368, simple_loss=0.04196, pruned_loss=0.003481, audio_tagging_loss=0.009217, over 13387.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09052, pruned_loss=0.01283, audio_tagging_loss=0.008754, over 3042270.63 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:23:54,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3196853.3333333335, ans=0.0 2023-11-26 03:23:56,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3196853.3333333335, ans=0.0 2023-11-26 03:23:59,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3196853.3333333335, ans=0.125 2023-11-26 03:24:07,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3196920.0, ans=0.0 2023-11-26 03:24:13,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479550 2023-11-26 03:24:23,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2023-11-26 03:24:38,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3197120.0, ans=0.0 2023-11-26 03:24:45,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-26 03:24:46,463 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10650, loss[loss=0.05932, simple_loss=0.07748, pruned_loss=0.01207, audio_tagging_loss=0.008506, over 15358.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09091, pruned_loss=0.01298, audio_tagging_loss=0.008694, over 3041600.53 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:25:03,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3197253.3333333335, ans=0.125 2023-11-26 03:25:08,808 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479600 2023-11-26 03:25:23,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2023-11-26 03:25:25,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3197386.6666666665, ans=0.0 2023-11-26 03:25:32,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3197453.3333333335, ans=0.0 2023-11-26 03:25:41,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.674e+01 8.472e+01 9.133e+01 9.975e+01 1.277e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 03:25:42,592 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10700, loss[loss=0.06388, simple_loss=0.08858, pruned_loss=0.01011, audio_tagging_loss=0.009473, over 14466.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08943, pruned_loss=0.01263, audio_tagging_loss=0.008767, over 3039707.37 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-26 03:25:45,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2023-11-26 03:25:49,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3197520.0, ans=0.125 2023-11-26 03:25:51,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3197520.0, ans=0.0 2023-11-26 03:26:00,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3197586.6666666665, ans=0.125 2023-11-26 03:26:05,521 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479650 2023-11-26 03:26:14,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3197653.3333333335, ans=0.1 2023-11-26 03:26:23,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3197720.0, ans=0.0 2023-11-26 03:26:28,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3197786.6666666665, ans=0.125 2023-11-26 03:26:38,381 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10750, loss[loss=0.05562, simple_loss=0.0749, pruned_loss=0.01121, audio_tagging_loss=0.006962, over 15008.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09015, pruned_loss=0.01273, audio_tagging_loss=0.008723, over 3035091.13 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-26 03:26:45,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3197853.3333333335, ans=0.125 2023-11-26 03:26:49,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3197920.0, ans=0.0 2023-11-26 03:27:00,718 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479700 2023-11-26 03:27:33,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.525e+01 9.180e+01 9.934e+01 1.273e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 03:27:34,666 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10800, loss[loss=0.06188, simple_loss=0.07139, pruned_loss=0.01696, audio_tagging_loss=0.009223, over 13139.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09048, pruned_loss=0.01271, audio_tagging_loss=0.008714, over 3042456.81 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:27:34,932 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:27:36,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3198186.6666666665, ans=0.125 2023-11-26 03:27:56,904 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479750 2023-11-26 03:28:03,495 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:28:15,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3198386.6666666665, ans=0.125 2023-11-26 03:28:29,942 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10850, loss[loss=0.0638, simple_loss=0.07559, pruned_loss=0.01492, audio_tagging_loss=0.01109, over 15585.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09038, pruned_loss=0.0127, audio_tagging_loss=0.008828, over 3037267.31 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:28:39,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3198520.0, ans=0.125 2023-11-26 03:28:41,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2023-11-26 03:28:44,532 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=15.0 2023-11-26 03:28:53,354 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479800 2023-11-26 03:29:09,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3198720.0, ans=0.0 2023-11-26 03:29:19,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3198786.6666666665, ans=0.125 2023-11-26 03:29:24,292 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:29:25,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.656e+01 9.276e+01 9.852e+01 1.367e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 03:29:26,359 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10900, loss[loss=0.05802, simple_loss=0.07948, pruned_loss=0.01162, audio_tagging_loss=0.006657, over 14987.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09033, pruned_loss=0.01256, audio_tagging_loss=0.008843, over 3042617.80 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:29:26,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3198853.3333333335, ans=0.125 2023-11-26 03:29:42,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-26 03:29:48,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479850 2023-11-26 03:29:54,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.30 vs. limit=22.5 2023-11-26 03:30:04,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3199053.3333333335, ans=0.2 2023-11-26 03:30:05,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3199053.3333333335, ans=0.125 2023-11-26 03:30:13,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3199120.0, ans=0.125 2023-11-26 03:30:22,722 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10950, loss[loss=0.05391, simple_loss=0.06497, pruned_loss=0.008164, audio_tagging_loss=0.01326, over 14231.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08948, pruned_loss=0.01231, audio_tagging_loss=0.008966, over 3039489.16 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:30:25,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3199186.6666666665, ans=10.0 2023-11-26 03:30:44,447 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479900 2023-11-26 03:30:51,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3199320.0, ans=0.1 2023-11-26 03:30:54,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3199386.6666666665, ans=0.1 2023-11-26 03:31:00,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3199386.6666666665, ans=0.025 2023-11-26 03:31:16,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.536e+01 9.275e+01 9.846e+01 1.244e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 03:31:17,923 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11000, loss[loss=0.08083, simple_loss=0.1126, pruned_loss=0.01494, audio_tagging_loss=0.009586, over 16919.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09056, pruned_loss=0.01246, audio_tagging_loss=0.008926, over 3039537.92 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:31:27,954 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:31:40,782 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479950 2023-11-26 03:31:43,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3199653.3333333335, ans=0.1 2023-11-26 03:31:59,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3199720.0, ans=0.09899494936611666 2023-11-26 03:32:14,138 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11050, loss[loss=0.08648, simple_loss=0.1187, pruned_loss=0.01979, audio_tagging_loss=0.007342, over 15146.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09112, pruned_loss=0.01266, audio_tagging_loss=0.008936, over 3040250.74 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:32:36,589 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480000 2023-11-26 03:32:42,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3199986.6666666665, ans=0.1 2023-11-26 03:32:42,635 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=12.0 2023-11-26 03:32:48,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3200053.3333333335, ans=0.0 2023-11-26 03:32:49,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3200053.3333333335, ans=0.0 2023-11-26 03:33:06,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3200120.0, ans=0.1 2023-11-26 03:33:11,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.630e+01 9.436e+01 1.031e+02 1.953e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-26 03:33:12,215 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11100, loss[loss=0.05716, simple_loss=0.07482, pruned_loss=0.01047, audio_tagging_loss=0.009286, over 14535.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09132, pruned_loss=0.01266, audio_tagging_loss=0.008985, over 3043154.01 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:33:33,463 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480050 2023-11-26 03:33:47,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3200386.6666666665, ans=0.125 2023-11-26 03:33:59,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3200453.3333333335, ans=0.2 2023-11-26 03:34:02,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3200453.3333333335, ans=0.125 2023-11-26 03:34:07,196 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11150, loss[loss=0.04343, simple_loss=0.05327, pruned_loss=0.007402, audio_tagging_loss=0.009392, over 14248.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09042, pruned_loss=0.01258, audio_tagging_loss=0.009196, over 3040061.97 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:34:11,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3200520.0, ans=0.2 2023-11-26 03:34:14,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3200520.0, ans=0.125 2023-11-26 03:34:22,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3200586.6666666665, ans=0.2 2023-11-26 03:34:23,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2023-11-26 03:34:26,659 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-11-26 03:34:29,462 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480100 2023-11-26 03:34:35,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-11-26 03:34:45,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3200720.0, ans=0.2 2023-11-26 03:34:47,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=3200720.0, ans=15.0 2023-11-26 03:34:48,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3200720.0, ans=0.125 2023-11-26 03:34:48,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3200720.0, ans=0.0 2023-11-26 03:35:00,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.918e+01 9.641e+01 1.031e+02 1.753e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 03:35:02,650 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11200, loss[loss=0.06837, simple_loss=0.08695, pruned_loss=0.01421, audio_tagging_loss=0.01068, over 15893.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09028, pruned_loss=0.0124, audio_tagging_loss=0.009289, over 3045847.48 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:35:08,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-11-26 03:35:18,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3200920.0, ans=0.1 2023-11-26 03:35:19,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3200920.0, ans=0.0 2023-11-26 03:35:25,400 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480150 2023-11-26 03:35:29,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3200986.6666666665, ans=0.125 2023-11-26 03:35:38,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3201053.3333333335, ans=0.1 2023-11-26 03:35:59,475 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11250, loss[loss=0.06343, simple_loss=0.08926, pruned_loss=0.01017, audio_tagging_loss=0.00863, over 15614.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09007, pruned_loss=0.01231, audio_tagging_loss=0.009192, over 3048490.99 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:36:01,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3201186.6666666665, ans=0.125 2023-11-26 03:36:18,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3201253.3333333335, ans=0.0 2023-11-26 03:36:20,660 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480200 2023-11-26 03:36:24,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3201320.0, ans=0.125 2023-11-26 03:36:25,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3201320.0, ans=0.05 2023-11-26 03:36:37,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3201386.6666666665, ans=0.125 2023-11-26 03:36:48,993 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-26 03:36:53,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.665e+01 9.306e+01 1.002e+02 1.136e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 03:36:54,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2023-11-26 03:36:54,680 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11300, loss[loss=0.05948, simple_loss=0.09178, pruned_loss=0.009059, audio_tagging_loss=0.004526, over 16323.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09064, pruned_loss=0.0125, audio_tagging_loss=0.008914, over 3045711.60 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:37:08,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-26 03:37:10,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3201586.6666666665, ans=0.02 2023-11-26 03:37:14,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3201586.6666666665, ans=0.125 2023-11-26 03:37:16,556 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480250 2023-11-26 03:37:22,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3201653.3333333335, ans=0.0 2023-11-26 03:37:39,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3201786.6666666665, ans=0.0 2023-11-26 03:37:44,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3201786.6666666665, ans=0.0 2023-11-26 03:37:49,990 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11350, loss[loss=0.06126, simple_loss=0.07894, pruned_loss=0.01454, audio_tagging_loss=0.007251, over 15152.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09091, pruned_loss=0.01267, audio_tagging_loss=0.008745, over 3046367.73 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:37:50,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3201853.3333333335, ans=0.02 2023-11-26 03:37:54,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3201853.3333333335, ans=12.0 2023-11-26 03:37:54,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=3201853.3333333335, ans=15.0 2023-11-26 03:37:58,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-26 03:38:04,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3201920.0, ans=0.125 2023-11-26 03:38:13,016 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480300 2023-11-26 03:38:27,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3202053.3333333335, ans=0.05 2023-11-26 03:38:44,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.638e+01 9.308e+01 1.022e+02 1.333e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 03:38:45,433 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11400, loss[loss=0.06122, simple_loss=0.07991, pruned_loss=0.01222, audio_tagging_loss=0.009047, over 13393.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09119, pruned_loss=0.01263, audio_tagging_loss=0.008647, over 3041120.10 frames. ], batch size: 52, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:38:45,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3202186.6666666665, ans=0.0 2023-11-26 03:38:49,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-11-26 03:38:51,135 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:38:54,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3202186.6666666665, ans=0.1 2023-11-26 03:39:03,081 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2023-11-26 03:39:07,942 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480350 2023-11-26 03:39:14,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-11-26 03:39:32,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3202453.3333333335, ans=0.125 2023-11-26 03:39:35,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3202453.3333333335, ans=0.2 2023-11-26 03:39:41,786 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11450, loss[loss=0.05431, simple_loss=0.07376, pruned_loss=0.009691, audio_tagging_loss=0.007737, over 16134.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09078, pruned_loss=0.01268, audio_tagging_loss=0.008636, over 3035707.56 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:39:48,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3202520.0, ans=0.0 2023-11-26 03:39:55,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3202586.6666666665, ans=0.09899494936611666 2023-11-26 03:40:03,246 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=22.5 2023-11-26 03:40:03,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480400 2023-11-26 03:40:24,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3202720.0, ans=0.125 2023-11-26 03:40:37,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.830e+01 9.338e+01 1.004e+02 1.564e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 03:40:37,387 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11500, loss[loss=0.0544, simple_loss=0.07414, pruned_loss=0.008221, audio_tagging_loss=0.009108, over 14549.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08949, pruned_loss=0.01247, audio_tagging_loss=0.008802, over 3036268.51 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:40:48,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3202920.0, ans=0.125 2023-11-26 03:41:00,698 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480450 2023-11-26 03:41:06,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3202986.6666666665, ans=0.0 2023-11-26 03:41:13,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-26 03:41:14,781 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:41:18,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3203053.3333333335, ans=0.125 2023-11-26 03:41:23,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3203120.0, ans=0.125 2023-11-26 03:41:33,096 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11550, loss[loss=0.06674, simple_loss=0.09297, pruned_loss=0.0144, audio_tagging_loss=0.005852, over 15388.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08972, pruned_loss=0.01252, audio_tagging_loss=0.008779, over 3041784.26 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:41:48,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-11-26 03:41:50,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3203253.3333333335, ans=0.0 2023-11-26 03:41:50,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3203253.3333333335, ans=0.125 2023-11-26 03:41:55,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2023-11-26 03:41:55,978 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480500 2023-11-26 03:42:09,078 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:42:20,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3203453.3333333335, ans=0.125 2023-11-26 03:42:29,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.932e+01 9.599e+01 1.033e+02 1.724e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 03:42:29,067 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11600, loss[loss=0.08475, simple_loss=0.119, pruned_loss=0.01833, audio_tagging_loss=0.006923, over 16012.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09076, pruned_loss=0.01277, audio_tagging_loss=0.008762, over 3038792.36 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:42:31,406 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:42:35,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3203520.0, ans=0.125 2023-11-26 03:42:35,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3203520.0, ans=0.125 2023-11-26 03:42:42,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3203586.6666666665, ans=0.125 2023-11-26 03:42:50,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480550 2023-11-26 03:42:51,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3203653.3333333335, ans=0.125 2023-11-26 03:42:53,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-26 03:42:56,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3203653.3333333335, ans=0.2 2023-11-26 03:43:02,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-11-26 03:43:24,211 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11650, loss[loss=0.06488, simple_loss=0.09165, pruned_loss=0.01214, audio_tagging_loss=0.006917, over 15178.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.0909, pruned_loss=0.01275, audio_tagging_loss=0.008813, over 3046820.39 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:43:27,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3203853.3333333335, ans=0.04949747468305833 2023-11-26 03:43:33,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3203853.3333333335, ans=0.125 2023-11-26 03:43:46,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480600 2023-11-26 03:44:02,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3204053.3333333335, ans=0.2 2023-11-26 03:44:15,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3204120.0, ans=0.125 2023-11-26 03:44:19,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3204186.6666666665, ans=0.0 2023-11-26 03:44:19,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.387e+01 9.006e+01 9.801e+01 1.650e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-26 03:44:19,977 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11700, loss[loss=0.07405, simple_loss=0.1072, pruned_loss=0.01424, audio_tagging_loss=0.006187, over 16249.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09025, pruned_loss=0.01259, audio_tagging_loss=0.008887, over 3044722.91 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:44:27,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3204186.6666666665, ans=0.5 2023-11-26 03:44:42,853 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480650 2023-11-26 03:44:42,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3204320.0, ans=0.02 2023-11-26 03:44:43,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3204320.0, ans=0.0 2023-11-26 03:44:57,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3204386.6666666665, ans=0.125 2023-11-26 03:45:04,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3204453.3333333335, ans=0.125 2023-11-26 03:45:15,958 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11750, loss[loss=0.05708, simple_loss=0.06767, pruned_loss=0.01306, audio_tagging_loss=0.01019, over 14353.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08944, pruned_loss=0.01253, audio_tagging_loss=0.008911, over 3045378.03 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:45:19,794 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.55 vs. limit=10.0 2023-11-26 03:45:31,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3204586.6666666665, ans=0.07 2023-11-26 03:45:38,307 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480700 2023-11-26 03:46:11,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.706e+01 8.820e+01 9.557e+01 1.032e+02 1.520e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 03:46:11,508 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11800, loss[loss=0.06622, simple_loss=0.08916, pruned_loss=0.01529, audio_tagging_loss=0.006349, over 14799.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09048, pruned_loss=0.01276, audio_tagging_loss=0.008967, over 3041235.63 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:46:28,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2023-11-26 03:46:32,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2023-11-26 03:46:34,372 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480750 2023-11-26 03:46:43,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3204986.6666666665, ans=0.125 2023-11-26 03:46:45,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3205053.3333333335, ans=0.0 2023-11-26 03:46:59,658 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:47:07,440 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11850, loss[loss=0.04595, simple_loss=0.05529, pruned_loss=0.007083, audio_tagging_loss=0.01122, over 15419.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08994, pruned_loss=0.01255, audio_tagging_loss=0.009052, over 3040148.69 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:47:12,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2023-11-26 03:47:22,474 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:47:29,869 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480800 2023-11-26 03:48:03,872 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11900, loss[loss=0.06979, simple_loss=0.09919, pruned_loss=0.01283, audio_tagging_loss=0.007371, over 15750.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08982, pruned_loss=0.01253, audio_tagging_loss=0.009113, over 3049245.85 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:48:04,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-26 03:48:04,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 8.863e+01 9.443e+01 1.007e+02 1.384e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-26 03:48:08,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3205520.0, ans=0.0 2023-11-26 03:48:15,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3205586.6666666665, ans=0.125 2023-11-26 03:48:25,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480850 2023-11-26 03:48:46,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3205720.0, ans=0.125 2023-11-26 03:48:47,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-26 03:48:49,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3205786.6666666665, ans=0.025 2023-11-26 03:48:52,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-26 03:48:53,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-26 03:48:59,011 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11950, loss[loss=0.06473, simple_loss=0.0941, pruned_loss=0.009514, audio_tagging_loss=0.008165, over 15599.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0897, pruned_loss=0.01254, audio_tagging_loss=0.00919, over 3042438.76 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:49:03,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-26 03:49:13,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-11-26 03:49:18,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3205920.0, ans=15.0 2023-11-26 03:49:20,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3205986.6666666665, ans=0.0 2023-11-26 03:49:21,503 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480900 2023-11-26 03:49:32,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3206053.3333333335, ans=0.125 2023-11-26 03:49:48,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-26 03:49:50,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3206120.0, ans=0.125 2023-11-26 03:49:51,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2023-11-26 03:49:53,272 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 12000, loss[loss=0.06424, simple_loss=0.09369, pruned_loss=0.01028, audio_tagging_loss=0.007111, over 15231.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08887, pruned_loss=0.01242, audio_tagging_loss=0.009248, over 3038498.29 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:49:53,273 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 03:50:25,670 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.0579, simple_loss=0.05064, pruned_loss=0.005235, audio_tagging_loss=0.02734, over 4681554.00 frames. 2023-11-26 03:50:25,671 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 03:50:26,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.771e+01 9.492e+01 1.018e+02 1.259e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 03:50:33,437 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-26 03:50:36,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3206253.3333333335, ans=0.1 2023-11-26 03:50:38,985 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2023-11-26 03:50:45,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3206253.3333333335, ans=0.125 2023-11-26 03:50:47,184 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480950 2023-11-26 03:51:24,315 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 0, loss[loss=0.07817, simple_loss=0.09664, pruned_loss=0.01241, audio_tagging_loss=0.01744, over 15658.00 frames. ], tot_loss[loss=0.07817, simple_loss=0.09664, pruned_loss=0.01241, audio_tagging_loss=0.01744, over 15658.00 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:51:24,316 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 03:51:55,676 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05811, simple_loss=0.05068, pruned_loss=0.005302, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-26 03:51:55,676 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 03:51:55,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3206360.0, ans=0.0 2023-11-26 03:52:13,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3206426.6666666665, ans=0.0 2023-11-26 03:52:16,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3206426.6666666665, ans=0.2 2023-11-26 03:52:35,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2023-11-26 03:52:44,399 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481000 2023-11-26 03:52:51,484 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 50, loss[loss=0.07619, simple_loss=0.1022, pruned_loss=0.01125, audio_tagging_loss=0.01386, over 16993.00 frames. ], tot_loss[loss=0.07441, simple_loss=0.09104, pruned_loss=0.01221, audio_tagging_loss=0.01668, over 695104.00 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:52:51,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3206693.3333333335, ans=0.125 2023-11-26 03:53:06,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3206760.0, ans=0.0 2023-11-26 03:53:19,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.787e+01 9.351e+01 1.009e+02 1.085e+02 1.541e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-26 03:53:22,152 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-26 03:53:24,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2023-11-26 03:53:25,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3206893.3333333335, ans=0.2 2023-11-26 03:53:40,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2023-11-26 03:53:41,045 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481050 2023-11-26 03:53:47,339 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 100, loss[loss=0.07225, simple_loss=0.08991, pruned_loss=0.01138, audio_tagging_loss=0.01592, over 16143.00 frames. ], tot_loss[loss=0.07456, simple_loss=0.09192, pruned_loss=0.01263, audio_tagging_loss=0.01597, over 1217138.16 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:53:53,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3207026.6666666665, ans=0.1 2023-11-26 03:53:56,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3207026.6666666665, ans=0.125 2023-11-26 03:53:58,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207093.3333333335, ans=0.1 2023-11-26 03:53:59,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.25 vs. limit=15.0 2023-11-26 03:54:01,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3207093.3333333335, ans=0.125 2023-11-26 03:54:17,304 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-26 03:54:23,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3207226.6666666665, ans=0.125 2023-11-26 03:54:36,745 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481100 2023-11-26 03:54:43,046 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 150, loss[loss=0.05644, simple_loss=0.06863, pruned_loss=0.009165, audio_tagging_loss=0.01296, over 15235.00 frames. ], tot_loss[loss=0.07181, simple_loss=0.0906, pruned_loss=0.01212, audio_tagging_loss=0.01439, over 1626624.70 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:54:46,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3207360.0, ans=0.0 2023-11-26 03:54:51,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3207360.0, ans=0.125 2023-11-26 03:54:59,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3207426.6666666665, ans=0.0 2023-11-26 03:55:00,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3207426.6666666665, ans=0.0 2023-11-26 03:55:10,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 9.007e+01 9.477e+01 1.014e+02 1.465e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 03:55:20,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.09 vs. limit=15.0 2023-11-26 03:55:23,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3207560.0, ans=0.0 2023-11-26 03:55:31,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3207626.6666666665, ans=0.125 2023-11-26 03:55:32,194 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481150 2023-11-26 03:55:38,446 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 200, loss[loss=0.08655, simple_loss=0.1235, pruned_loss=0.01617, audio_tagging_loss=0.008626, over 16133.00 frames. ], tot_loss[loss=0.07104, simple_loss=0.0916, pruned_loss=0.01242, audio_tagging_loss=0.01282, over 1951486.00 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:55:39,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3207693.3333333335, ans=0.0 2023-11-26 03:55:48,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=22.5 2023-11-26 03:55:49,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3207760.0, ans=0.0 2023-11-26 03:56:04,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3207826.6666666665, ans=0.04949747468305833 2023-11-26 03:56:12,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3207893.3333333335, ans=0.125 2023-11-26 03:56:28,225 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481200 2023-11-26 03:56:35,401 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 250, loss[loss=0.06119, simple_loss=0.08503, pruned_loss=0.008744, audio_tagging_loss=0.009934, over 15658.00 frames. ], tot_loss[loss=0.06977, simple_loss=0.09122, pruned_loss=0.01253, audio_tagging_loss=0.01163, over 2194284.28 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:56:37,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3208026.6666666665, ans=0.2 2023-11-26 03:56:59,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2023-11-26 03:57:00,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2023-11-26 03:57:04,328 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 8.798e+01 9.430e+01 1.056e+02 1.787e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 03:57:13,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3208226.6666666665, ans=10.0 2023-11-26 03:57:24,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481250 2023-11-26 03:57:25,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3208293.3333333335, ans=0.0 2023-11-26 03:57:31,757 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 300, loss[loss=0.08779, simple_loss=0.1221, pruned_loss=0.02111, audio_tagging_loss=0.005632, over 16218.00 frames. ], tot_loss[loss=0.06942, simple_loss=0.09194, pruned_loss=0.01272, audio_tagging_loss=0.01073, over 2387601.17 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:57:57,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3208493.3333333335, ans=0.125 2023-11-26 03:57:58,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-26 03:58:07,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3208560.0, ans=0.1 2023-11-26 03:58:10,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2023-11-26 03:58:11,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3208560.0, ans=0.0 2023-11-26 03:58:20,674 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481300 2023-11-26 03:58:26,979 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 350, loss[loss=0.065, simple_loss=0.0891, pruned_loss=0.01214, audio_tagging_loss=0.008308, over 14711.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09107, pruned_loss=0.01265, audio_tagging_loss=0.0101, over 2532202.83 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 8.0 2023-11-26 03:58:29,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3208693.3333333335, ans=0.125 2023-11-26 03:58:30,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3208693.3333333335, ans=0.0 2023-11-26 03:58:36,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3208693.3333333335, ans=0.125 2023-11-26 03:58:53,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-26 03:58:57,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.469e+01 9.311e+01 1.023e+02 1.499e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 03:58:57,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3208826.6666666665, ans=0.2 2023-11-26 03:58:59,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-26 03:59:02,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3208893.3333333335, ans=0.125 2023-11-26 03:59:06,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3208893.3333333335, ans=0.05 2023-11-26 03:59:16,460 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481350 2023-11-26 03:59:21,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2023-11-26 03:59:22,712 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 400, loss[loss=0.0826, simple_loss=0.1171, pruned_loss=0.01631, audio_tagging_loss=0.007765, over 15072.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09174, pruned_loss=0.01275, audio_tagging_loss=0.009734, over 2647886.35 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:59:24,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3209026.6666666665, ans=0.125 2023-11-26 03:59:43,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3209093.3333333335, ans=0.2 2023-11-26 03:59:44,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3209160.0, ans=0.125 2023-11-26 03:59:58,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3209226.6666666665, ans=0.1 2023-11-26 04:00:11,916 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481400 2023-11-26 04:00:12,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=22.5 2023-11-26 04:00:13,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3209293.3333333335, ans=0.0 2023-11-26 04:00:19,426 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 450, loss[loss=0.06698, simple_loss=0.09222, pruned_loss=0.01194, audio_tagging_loss=0.008932, over 14962.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09079, pruned_loss=0.01251, audio_tagging_loss=0.009447, over 2730306.05 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:00:41,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3209493.3333333335, ans=0.2 2023-11-26 04:00:48,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.492e+01 9.023e+01 9.553e+01 1.244e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-26 04:00:54,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3209560.0, ans=0.0 2023-11-26 04:01:08,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481450 2023-11-26 04:01:08,400 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:01:12,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2023-11-26 04:01:14,680 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 500, loss[loss=0.06875, simple_loss=0.09499, pruned_loss=0.01467, audio_tagging_loss=0.006587, over 16086.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09019, pruned_loss=0.01248, audio_tagging_loss=0.009337, over 2801773.37 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:01:30,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2023-11-26 04:01:49,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3209893.3333333335, ans=0.1 2023-11-26 04:01:49,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2023-11-26 04:02:00,905 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:02:00,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3209960.0, ans=0.125 2023-11-26 04:02:04,063 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481500 2023-11-26 04:02:10,857 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 550, loss[loss=0.05425, simple_loss=0.07462, pruned_loss=0.009349, audio_tagging_loss=0.007584, over 15113.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09035, pruned_loss=0.01254, audio_tagging_loss=0.009269, over 2854530.48 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:02:32,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3210160.0, ans=0.0 2023-11-26 04:02:41,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.521e+01 9.213e+01 9.979e+01 1.259e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 04:02:46,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3210226.6666666665, ans=0.125 2023-11-26 04:02:58,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3210293.3333333335, ans=0.1 2023-11-26 04:02:59,641 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481550 2023-11-26 04:03:06,636 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 600, loss[loss=0.0764, simple_loss=0.0929, pruned_loss=0.0187, audio_tagging_loss=0.01125, over 14665.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09069, pruned_loss=0.01256, audio_tagging_loss=0.009279, over 2893316.44 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:03:11,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3210360.0, ans=0.125 2023-11-26 04:03:18,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3210426.6666666665, ans=0.125 2023-11-26 04:03:24,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2023-11-26 04:03:33,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3210493.3333333335, ans=0.0 2023-11-26 04:03:55,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481600 2023-11-26 04:03:55,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.16 vs. limit=15.0 2023-11-26 04:03:55,764 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.08 vs. limit=10.0 2023-11-26 04:04:01,750 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 650, loss[loss=0.09557, simple_loss=0.1297, pruned_loss=0.02417, audio_tagging_loss=0.006536, over 15241.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09114, pruned_loss=0.01256, audio_tagging_loss=0.009249, over 2924099.26 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:04:02,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2023-11-26 04:04:06,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3210693.3333333335, ans=0.125 2023-11-26 04:04:27,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2023-11-26 04:04:31,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=22.5 2023-11-26 04:04:32,177 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.848e+01 9.324e+01 9.991e+01 1.249e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 04:04:50,507 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481650 2023-11-26 04:04:53,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3210960.0, ans=0.0 2023-11-26 04:04:57,404 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 700, loss[loss=0.07126, simple_loss=0.09792, pruned_loss=0.01352, audio_tagging_loss=0.008783, over 15316.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09166, pruned_loss=0.01268, audio_tagging_loss=0.00912, over 2953707.17 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:05:01,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3211026.6666666665, ans=0.1 2023-11-26 04:05:11,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3211093.3333333335, ans=0.0 2023-11-26 04:05:15,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3211093.3333333335, ans=0.2 2023-11-26 04:05:32,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2023-11-26 04:05:39,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3211226.6666666665, ans=0.0 2023-11-26 04:05:41,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-26 04:05:44,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3211293.3333333335, ans=0.0 2023-11-26 04:05:45,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.99 vs. limit=15.0 2023-11-26 04:05:46,373 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481700 2023-11-26 04:05:52,697 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 750, loss[loss=0.05375, simple_loss=0.07304, pruned_loss=0.01105, audio_tagging_loss=0.006177, over 13905.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09203, pruned_loss=0.01271, audio_tagging_loss=0.009056, over 2979336.89 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:06:00,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-26 04:06:16,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3211493.3333333335, ans=0.125 2023-11-26 04:06:23,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.760e+01 9.267e+01 1.006e+02 1.673e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 04:06:27,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3211560.0, ans=0.1 2023-11-26 04:06:33,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3211560.0, ans=0.04949747468305833 2023-11-26 04:06:33,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3211560.0, ans=0.125 2023-11-26 04:06:41,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-26 04:06:41,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481750 2023-11-26 04:06:41,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3211626.6666666665, ans=0.125 2023-11-26 04:06:48,716 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 800, loss[loss=0.07435, simple_loss=0.1007, pruned_loss=0.01256, audio_tagging_loss=0.01145, over 15611.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09261, pruned_loss=0.0129, audio_tagging_loss=0.008997, over 2994127.36 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:06:53,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3211693.3333333335, ans=0.125 2023-11-26 04:07:37,226 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481800 2023-11-26 04:07:44,318 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 850, loss[loss=0.05528, simple_loss=0.07404, pruned_loss=0.009758, audio_tagging_loss=0.008502, over 14983.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09282, pruned_loss=0.01303, audio_tagging_loss=0.009028, over 3003546.68 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:07:44,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-26 04:07:48,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-26 04:07:51,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-26 04:08:14,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.719e+01 9.497e+01 1.051e+02 1.257e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 04:08:15,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3212160.0, ans=0.04949747468305833 2023-11-26 04:08:16,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3212226.6666666665, ans=0.125 2023-11-26 04:08:21,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2023-11-26 04:08:26,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3212226.6666666665, ans=0.2 2023-11-26 04:08:29,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3212293.3333333335, ans=15.0 2023-11-26 04:08:32,631 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481850 2023-11-26 04:08:35,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3212293.3333333335, ans=0.04949747468305833 2023-11-26 04:08:38,957 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 900, loss[loss=0.06433, simple_loss=0.09371, pruned_loss=0.008286, audio_tagging_loss=0.009189, over 16599.00 frames. ], tot_loss[loss=0.06822, simple_loss=0.09239, pruned_loss=0.01291, audio_tagging_loss=0.009119, over 3017963.22 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:08:52,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3212426.6666666665, ans=0.0 2023-11-26 04:09:12,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3212560.0, ans=0.0 2023-11-26 04:09:12,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3212560.0, ans=0.2 2023-11-26 04:09:20,874 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:09:24,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3212626.6666666665, ans=0.0 2023-11-26 04:09:25,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3212626.6666666665, ans=0.1 2023-11-26 04:09:27,752 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481900 2023-11-26 04:09:32,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3212626.6666666665, ans=0.125 2023-11-26 04:09:34,167 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 950, loss[loss=0.06362, simple_loss=0.08699, pruned_loss=0.01168, audio_tagging_loss=0.008443, over 15258.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09193, pruned_loss=0.01286, audio_tagging_loss=0.009097, over 3025998.05 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:09:44,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3212760.0, ans=0.015 2023-11-26 04:09:48,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3212760.0, ans=0.125 2023-11-26 04:10:04,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.675e+01 9.421e+01 1.013e+02 1.384e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 04:10:23,841 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481950 2023-11-26 04:10:30,170 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1000, loss[loss=0.05256, simple_loss=0.06701, pruned_loss=0.008325, audio_tagging_loss=0.01073, over 13839.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09144, pruned_loss=0.01284, audio_tagging_loss=0.008971, over 3027294.79 frames. ], batch size: 52, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:10:32,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=15.0 2023-11-26 04:10:38,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-26 04:10:49,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-11-26 04:10:54,208 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:11:15,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3213293.3333333335, ans=0.0 2023-11-26 04:11:19,746 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482000 2023-11-26 04:11:22,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3213293.3333333335, ans=0.0 2023-11-26 04:11:26,311 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1050, loss[loss=0.07124, simple_loss=0.09768, pruned_loss=0.01376, audio_tagging_loss=0.008634, over 14352.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09077, pruned_loss=0.01267, audio_tagging_loss=0.008884, over 3028671.59 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:11:31,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-26 04:11:42,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3213426.6666666665, ans=0.125 2023-11-26 04:11:50,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3213493.3333333335, ans=0.1 2023-11-26 04:11:57,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.643e+01 9.285e+01 1.025e+02 1.343e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 04:12:12,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3213626.6666666665, ans=0.1 2023-11-26 04:12:12,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3213626.6666666665, ans=0.125 2023-11-26 04:12:15,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3213626.6666666665, ans=0.125 2023-11-26 04:12:16,594 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482050 2023-11-26 04:12:22,922 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1100, loss[loss=0.06679, simple_loss=0.0926, pruned_loss=0.01297, audio_tagging_loss=0.007526, over 14957.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09013, pruned_loss=0.01246, audio_tagging_loss=0.008841, over 3036210.53 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:12:25,123 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:12:41,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3213760.0, ans=0.125 2023-11-26 04:12:50,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3213826.6666666665, ans=0.1 2023-11-26 04:12:58,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3213893.3333333335, ans=0.0 2023-11-26 04:13:11,053 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482100 2023-11-26 04:13:17,887 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1150, loss[loss=0.05528, simple_loss=0.07595, pruned_loss=0.0102, audio_tagging_loss=0.00711, over 14742.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08979, pruned_loss=0.01246, audio_tagging_loss=0.008853, over 3035337.62 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:13:19,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3214026.6666666665, ans=0.125 2023-11-26 04:13:24,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3214026.6666666665, ans=0.125 2023-11-26 04:13:35,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-26 04:13:39,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3214160.0, ans=0.0 2023-11-26 04:13:40,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3214160.0, ans=0.125 2023-11-26 04:13:48,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.737e+01 9.281e+01 9.829e+01 1.139e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 04:13:54,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.87 vs. limit=15.0 2023-11-26 04:13:55,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3214226.6666666665, ans=0.1 2023-11-26 04:13:56,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2023-11-26 04:14:06,866 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482150 2023-11-26 04:14:07,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-11-26 04:14:13,200 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1200, loss[loss=0.07323, simple_loss=0.0985, pruned_loss=0.01418, audio_tagging_loss=0.009801, over 15235.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08966, pruned_loss=0.01255, audio_tagging_loss=0.008805, over 3030794.54 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:14:26,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3214426.6666666665, ans=0.09899494936611666 2023-11-26 04:14:32,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2023-11-26 04:14:33,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3214426.6666666665, ans=0.125 2023-11-26 04:14:44,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3214493.3333333335, ans=0.125 2023-11-26 04:14:51,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3214560.0, ans=0.125 2023-11-26 04:15:02,002 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482200 2023-11-26 04:15:02,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3214626.6666666665, ans=0.0 2023-11-26 04:15:07,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3214626.6666666665, ans=0.125 2023-11-26 04:15:09,165 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1250, loss[loss=0.08122, simple_loss=0.1188, pruned_loss=0.01555, audio_tagging_loss=0.006269, over 16016.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08925, pruned_loss=0.01251, audio_tagging_loss=0.008796, over 3036985.91 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:15:09,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3214693.3333333335, ans=0.125 2023-11-26 04:15:39,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.848e+01 9.499e+01 1.001e+02 1.397e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 04:15:57,558 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482250 2023-11-26 04:16:03,850 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1300, loss[loss=0.06825, simple_loss=0.1012, pruned_loss=0.009628, audio_tagging_loss=0.008002, over 14303.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08905, pruned_loss=0.01239, audio_tagging_loss=0.008756, over 3031654.48 frames. ], batch size: 52, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:16:24,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3215093.3333333335, ans=0.125 2023-11-26 04:16:25,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3215160.0, ans=0.2 2023-11-26 04:16:35,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3215160.0, ans=0.95 2023-11-26 04:16:36,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215160.0, ans=0.1 2023-11-26 04:16:36,314 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=12.0 2023-11-26 04:16:43,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3215226.6666666665, ans=0.125 2023-11-26 04:16:53,399 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482300 2023-11-26 04:17:00,349 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1350, loss[loss=0.06231, simple_loss=0.08428, pruned_loss=0.0125, audio_tagging_loss=0.007669, over 16366.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08911, pruned_loss=0.01245, audio_tagging_loss=0.008759, over 3038289.99 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:17:04,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3215360.0, ans=0.125 2023-11-26 04:17:11,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215426.6666666665, ans=0.1 2023-11-26 04:17:25,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3215493.3333333335, ans=0.125 2023-11-26 04:17:31,340 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.487e+01 8.991e+01 9.732e+01 2.025e+02, threshold=1.798e+02, percent-clipped=1.0 2023-11-26 04:17:38,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3215560.0, ans=0.1 2023-11-26 04:17:41,001 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:17:49,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482350 2023-11-26 04:17:50,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3215626.6666666665, ans=0.125 2023-11-26 04:17:52,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3215626.6666666665, ans=0.125 2023-11-26 04:17:56,837 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1400, loss[loss=0.06899, simple_loss=0.09754, pruned_loss=0.01348, audio_tagging_loss=0.006743, over 14524.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08995, pruned_loss=0.01258, audio_tagging_loss=0.008813, over 3036202.19 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:18:10,816 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:18:39,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-11-26 04:18:45,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482400 2023-11-26 04:18:52,445 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1450, loss[loss=0.0704, simple_loss=0.09851, pruned_loss=0.01401, audio_tagging_loss=0.007141, over 15298.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09173, pruned_loss=0.01276, audio_tagging_loss=0.008704, over 3048761.50 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:18:53,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3216026.6666666665, ans=0.125 2023-11-26 04:19:05,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3216093.3333333335, ans=0.2 2023-11-26 04:19:07,213 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-26 04:19:16,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2023-11-26 04:19:22,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3216160.0, ans=0.0 2023-11-26 04:19:24,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.656e+01 9.210e+01 9.975e+01 1.432e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:19:30,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-26 04:19:37,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3216293.3333333335, ans=0.0 2023-11-26 04:19:41,280 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482450 2023-11-26 04:19:45,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3216293.3333333335, ans=0.125 2023-11-26 04:19:48,061 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1500, loss[loss=0.05802, simple_loss=0.07468, pruned_loss=0.01037, audio_tagging_loss=0.01031, over 15726.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.0916, pruned_loss=0.01283, audio_tagging_loss=0.008846, over 3051282.40 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:20:05,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3216426.6666666665, ans=0.125 2023-11-26 04:20:06,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3216426.6666666665, ans=0.0 2023-11-26 04:20:18,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-26 04:20:18,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3216493.3333333335, ans=0.0 2023-11-26 04:20:32,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3216626.6666666665, ans=0.125 2023-11-26 04:20:37,461 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482500 2023-11-26 04:20:40,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-26 04:20:44,830 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1550, loss[loss=0.06707, simple_loss=0.08399, pruned_loss=0.01298, audio_tagging_loss=0.0121, over 15459.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09129, pruned_loss=0.01262, audio_tagging_loss=0.008814, over 3049420.92 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:20:58,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3216760.0, ans=0.0 2023-11-26 04:21:04,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3216760.0, ans=0.0 2023-11-26 04:21:11,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2023-11-26 04:21:13,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3216826.6666666665, ans=0.0 2023-11-26 04:21:15,035 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.764e+01 9.258e+01 1.010e+02 1.215e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 04:21:23,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3216893.3333333335, ans=0.125 2023-11-26 04:21:28,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3216960.0, ans=0.125 2023-11-26 04:21:33,701 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482550 2023-11-26 04:21:40,014 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1600, loss[loss=0.07121, simple_loss=0.09904, pruned_loss=0.01262, audio_tagging_loss=0.009065, over 14383.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.0909, pruned_loss=0.01255, audio_tagging_loss=0.008856, over 3052023.93 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:21:45,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3217026.6666666665, ans=0.125 2023-11-26 04:21:47,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3217026.6666666665, ans=0.125 2023-11-26 04:22:00,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-26 04:22:13,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3217226.6666666665, ans=0.0 2023-11-26 04:22:28,853 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482600 2023-11-26 04:22:30,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2023-11-26 04:22:36,040 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1650, loss[loss=0.06191, simple_loss=0.08342, pruned_loss=0.01012, audio_tagging_loss=0.01008, over 16285.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09111, pruned_loss=0.01261, audio_tagging_loss=0.008881, over 3051664.77 frames. ], batch size: 63, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:22:37,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3217360.0, ans=0.125 2023-11-26 04:23:00,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3217493.3333333335, ans=0.125 2023-11-26 04:23:07,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.467e+01 9.120e+01 9.826e+01 1.173e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 04:23:24,378 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482650 2023-11-26 04:23:31,213 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1700, loss[loss=0.06749, simple_loss=0.09952, pruned_loss=0.01003, audio_tagging_loss=0.007702, over 15341.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09141, pruned_loss=0.01254, audio_tagging_loss=0.008957, over 3057506.00 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:23:36,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3217693.3333333335, ans=0.2 2023-11-26 04:24:04,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3217893.3333333335, ans=0.0 2023-11-26 04:24:10,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217893.3333333335, ans=0.1 2023-11-26 04:24:18,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3217960.0, ans=0.035 2023-11-26 04:24:18,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3217960.0, ans=0.0 2023-11-26 04:24:20,453 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482700 2023-11-26 04:24:26,774 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1750, loss[loss=0.06557, simple_loss=0.08851, pruned_loss=0.01133, audio_tagging_loss=0.009982, over 14967.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09093, pruned_loss=0.0124, audio_tagging_loss=0.008939, over 3055283.38 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:24:48,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3218160.0, ans=0.04949747468305833 2023-11-26 04:24:59,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.710e+01 9.428e+01 1.004e+02 1.247e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 04:25:03,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3218226.6666666665, ans=22.5 2023-11-26 04:25:13,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3218293.3333333335, ans=0.04949747468305833 2023-11-26 04:25:15,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482750 2023-11-26 04:25:22,297 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1800, loss[loss=0.06124, simple_loss=0.08856, pruned_loss=0.01031, audio_tagging_loss=0.006653, over 15599.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09045, pruned_loss=0.01245, audio_tagging_loss=0.008839, over 3056258.65 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:25:37,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3218426.6666666665, ans=0.125 2023-11-26 04:25:48,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3218493.3333333335, ans=0.2 2023-11-26 04:25:56,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3218560.0, ans=0.125 2023-11-26 04:26:08,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3218626.6666666665, ans=0.125 2023-11-26 04:26:10,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3218626.6666666665, ans=0.0 2023-11-26 04:26:11,611 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482800 2023-11-26 04:26:11,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3218626.6666666665, ans=0.07 2023-11-26 04:26:16,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3218626.6666666665, ans=0.2 2023-11-26 04:26:18,179 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1850, loss[loss=0.07369, simple_loss=0.1004, pruned_loss=0.01549, audio_tagging_loss=0.007991, over 14982.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09071, pruned_loss=0.01253, audio_tagging_loss=0.008766, over 3053758.97 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:26:29,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3218760.0, ans=0.2 2023-11-26 04:26:32,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3218760.0, ans=0.0 2023-11-26 04:26:40,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3218826.6666666665, ans=0.125 2023-11-26 04:26:40,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3218826.6666666665, ans=0.2 2023-11-26 04:26:51,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.663e+01 9.346e+01 1.025e+02 1.313e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 04:27:01,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3218893.3333333335, ans=0.0 2023-11-26 04:27:06,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3218960.0, ans=0.5 2023-11-26 04:27:08,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482850 2023-11-26 04:27:15,299 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1900, loss[loss=0.07313, simple_loss=0.09768, pruned_loss=0.01398, audio_tagging_loss=0.01031, over 15288.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09063, pruned_loss=0.01248, audio_tagging_loss=0.008685, over 3053327.81 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:27:36,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3219160.0, ans=0.125 2023-11-26 04:27:36,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3219160.0, ans=0.0 2023-11-26 04:27:40,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2023-11-26 04:27:48,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3219226.6666666665, ans=0.0 2023-11-26 04:27:53,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:57,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:28:00,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-26 04:28:04,356 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482900 2023-11-26 04:28:08,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3219293.3333333335, ans=0.1 2023-11-26 04:28:11,346 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1950, loss[loss=0.05916, simple_loss=0.07798, pruned_loss=0.009996, audio_tagging_loss=0.01017, over 14320.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09102, pruned_loss=0.01264, audio_tagging_loss=0.00869, over 3053487.97 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:28:11,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3219360.0, ans=0.0 2023-11-26 04:28:17,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3219360.0, ans=0.0 2023-11-26 04:28:43,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.481e+01 9.198e+01 9.869e+01 1.193e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 04:29:00,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482950 2023-11-26 04:29:02,519 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:29:06,468 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2000, loss[loss=0.05386, simple_loss=0.07074, pruned_loss=0.008407, audio_tagging_loss=0.01009, over 16760.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08995, pruned_loss=0.01257, audio_tagging_loss=0.008829, over 3047412.81 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:29:06,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3219693.3333333335, ans=0.0 2023-11-26 04:29:07,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-26 04:29:07,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3219693.3333333335, ans=0.1 2023-11-26 04:29:29,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3219826.6666666665, ans=0.0 2023-11-26 04:29:56,801 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483000 2023-11-26 04:29:58,352 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.56 vs. limit=22.5 2023-11-26 04:30:03,467 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2050, loss[loss=0.06476, simple_loss=0.09085, pruned_loss=0.009768, audio_tagging_loss=0.009565, over 15760.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09133, pruned_loss=0.01276, audio_tagging_loss=0.008751, over 3046684.37 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:30:04,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-11-26 04:30:16,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3220093.3333333335, ans=0.0 2023-11-26 04:30:18,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-26 04:30:24,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3220093.3333333335, ans=0.0 2023-11-26 04:30:31,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3220160.0, ans=0.125 2023-11-26 04:30:36,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.605e+01 9.268e+01 1.003e+02 1.182e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 04:30:40,146 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:30:44,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3220226.6666666665, ans=0.125 2023-11-26 04:30:45,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3220226.6666666665, ans=0.125 2023-11-26 04:30:53,347 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483050 2023-11-26 04:30:57,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3220293.3333333335, ans=0.125 2023-11-26 04:30:59,854 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2100, loss[loss=0.07368, simple_loss=0.1101, pruned_loss=0.01301, audio_tagging_loss=0.005628, over 15362.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09018, pruned_loss=0.01253, audio_tagging_loss=0.008757, over 3045897.83 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:31:00,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3220360.0, ans=0.125 2023-11-26 04:31:10,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-26 04:31:43,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3220626.6666666665, ans=0.125 2023-11-26 04:31:49,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483100 2023-11-26 04:31:49,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3220626.6666666665, ans=0.0 2023-11-26 04:31:55,334 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2150, loss[loss=0.06536, simple_loss=0.08612, pruned_loss=0.01454, audio_tagging_loss=0.007756, over 14717.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08996, pruned_loss=0.01259, audio_tagging_loss=0.008749, over 3042869.62 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:32:12,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3220760.0, ans=0.125 2023-11-26 04:32:17,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-11-26 04:32:20,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3220826.6666666665, ans=0.0 2023-11-26 04:32:21,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3220826.6666666665, ans=0.2 2023-11-26 04:32:28,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.606e+01 9.465e+01 1.020e+02 1.219e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 04:32:29,690 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:32:41,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3220960.0, ans=0.0 2023-11-26 04:32:45,728 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483150 2023-11-26 04:32:49,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3220960.0, ans=0.125 2023-11-26 04:32:52,091 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2200, loss[loss=0.08514, simple_loss=0.1193, pruned_loss=0.01626, audio_tagging_loss=0.009247, over 15552.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09026, pruned_loss=0.01252, audio_tagging_loss=0.00877, over 3049768.24 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:32:56,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-26 04:33:01,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3221093.3333333335, ans=0.125 2023-11-26 04:33:07,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3221093.3333333335, ans=0.125 2023-11-26 04:33:12,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3221093.3333333335, ans=0.125 2023-11-26 04:33:14,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3221160.0, ans=0.0 2023-11-26 04:33:26,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3221226.6666666665, ans=0.95 2023-11-26 04:33:41,022 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483200 2023-11-26 04:33:46,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3221360.0, ans=0.1 2023-11-26 04:33:47,555 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2250, loss[loss=0.05472, simple_loss=0.07549, pruned_loss=0.007269, audio_tagging_loss=0.009705, over 14932.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09044, pruned_loss=0.01255, audio_tagging_loss=0.008759, over 3041756.97 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:33:54,553 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2023-11-26 04:34:12,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3221493.3333333335, ans=0.125 2023-11-26 04:34:15,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3221493.3333333335, ans=0.125 2023-11-26 04:34:21,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.817e+01 9.211e+01 9.808e+01 1.275e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:34:32,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3221626.6666666665, ans=0.0 2023-11-26 04:34:37,731 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483250 2023-11-26 04:34:38,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3221626.6666666665, ans=0.0 2023-11-26 04:34:42,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3221626.6666666665, ans=0.0 2023-11-26 04:34:44,124 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2300, loss[loss=0.08613, simple_loss=0.1251, pruned_loss=0.01639, audio_tagging_loss=0.007204, over 15520.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09025, pruned_loss=0.01243, audio_tagging_loss=0.008781, over 3044040.27 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:34:58,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3221760.0, ans=0.2 2023-11-26 04:35:27,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3221960.0, ans=0.1 2023-11-26 04:35:32,586 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:35:32,637 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483300 2023-11-26 04:35:32,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3221960.0, ans=0.125 2023-11-26 04:35:40,081 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2350, loss[loss=0.08058, simple_loss=0.1142, pruned_loss=0.01369, audio_tagging_loss=0.009809, over 16121.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09011, pruned_loss=0.0123, audio_tagging_loss=0.008907, over 3044959.71 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:35:59,454 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:36:12,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3222226.6666666665, ans=0.125 2023-11-26 04:36:13,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.775e+01 9.413e+01 9.957e+01 1.252e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 04:36:18,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3222226.6666666665, ans=0.125 2023-11-26 04:36:20,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3222226.6666666665, ans=0.125 2023-11-26 04:36:29,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483350 2023-11-26 04:36:35,664 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2400, loss[loss=0.06339, simple_loss=0.08846, pruned_loss=0.008326, audio_tagging_loss=0.01083, over 15006.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09054, pruned_loss=0.01233, audio_tagging_loss=0.008999, over 3045331.69 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:36:37,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3222360.0, ans=0.2 2023-11-26 04:36:37,138 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2023-11-26 04:36:39,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3222360.0, ans=0.1 2023-11-26 04:36:48,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3222426.6666666665, ans=0.0 2023-11-26 04:36:55,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3222426.6666666665, ans=0.0 2023-11-26 04:37:22,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3222626.6666666665, ans=0.0 2023-11-26 04:37:24,490 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483400 2023-11-26 04:37:25,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-26 04:37:32,128 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2450, loss[loss=0.06534, simple_loss=0.08639, pruned_loss=0.01231, audio_tagging_loss=0.009832, over 15164.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.0901, pruned_loss=0.01223, audio_tagging_loss=0.009054, over 3045925.60 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:37:38,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3222693.3333333335, ans=0.0 2023-11-26 04:37:43,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=12.0 2023-11-26 04:38:04,688 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=12.0 2023-11-26 04:38:05,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.820e+01 9.460e+01 9.914e+01 1.229e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 04:38:16,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3222960.0, ans=0.0 2023-11-26 04:38:19,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3222960.0, ans=0.1 2023-11-26 04:38:20,993 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483450 2023-11-26 04:38:28,468 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2500, loss[loss=0.06702, simple_loss=0.08388, pruned_loss=0.01529, audio_tagging_loss=0.009789, over 14702.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09136, pruned_loss=0.01259, audio_tagging_loss=0.009012, over 3041642.01 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:39:12,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3223293.3333333335, ans=0.125 2023-11-26 04:39:17,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483500 2023-11-26 04:39:17,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3223293.3333333335, ans=0.04949747468305833 2023-11-26 04:39:23,659 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2550, loss[loss=0.0505, simple_loss=0.06934, pruned_loss=0.006335, audio_tagging_loss=0.009489, over 16045.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09211, pruned_loss=0.01285, audio_tagging_loss=0.008961, over 3045411.14 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:39:33,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3223426.6666666665, ans=0.125 2023-11-26 04:39:44,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3223426.6666666665, ans=0.125 2023-11-26 04:39:53,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2023-11-26 04:39:58,035 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.655e+01 9.369e+01 9.898e+01 1.233e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 04:40:02,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3223560.0, ans=0.2 2023-11-26 04:40:03,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3223560.0, ans=0.05 2023-11-26 04:40:10,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223626.6666666665, ans=0.1 2023-11-26 04:40:13,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483550 2023-11-26 04:40:20,118 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2600, loss[loss=0.07324, simple_loss=0.1138, pruned_loss=0.01183, audio_tagging_loss=0.004493, over 14288.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09194, pruned_loss=0.01273, audio_tagging_loss=0.008792, over 3042838.53 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:40:23,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3223693.3333333335, ans=0.125 2023-11-26 04:40:30,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3223760.0, ans=15.0 2023-11-26 04:40:34,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3223760.0, ans=0.0 2023-11-26 04:41:05,675 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=12.0 2023-11-26 04:41:09,394 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483600 2023-11-26 04:41:17,195 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2650, loss[loss=0.06512, simple_loss=0.09746, pruned_loss=0.01032, audio_tagging_loss=0.006068, over 14398.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.0912, pruned_loss=0.01271, audio_tagging_loss=0.008823, over 3043570.40 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:41:44,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3224160.0, ans=0.125 2023-11-26 04:41:50,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 8.492e+01 9.203e+01 1.002e+02 1.237e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 04:41:57,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=10.0 2023-11-26 04:41:58,728 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2023-11-26 04:42:06,584 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483650 2023-11-26 04:42:12,939 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2700, loss[loss=0.06261, simple_loss=0.08161, pruned_loss=0.01472, audio_tagging_loss=0.007083, over 15489.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08972, pruned_loss=0.01253, audio_tagging_loss=0.008782, over 3039581.90 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:42:20,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3224360.0, ans=0.125 2023-11-26 04:42:36,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3224493.3333333335, ans=15.0 2023-11-26 04:42:37,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3224493.3333333335, ans=0.125 2023-11-26 04:42:49,138 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=15.0 2023-11-26 04:43:01,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3224626.6666666665, ans=0.125 2023-11-26 04:43:02,262 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483700 2023-11-26 04:43:08,494 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2750, loss[loss=0.05914, simple_loss=0.08545, pruned_loss=0.00725, audio_tagging_loss=0.009167, over 15643.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08924, pruned_loss=0.0125, audio_tagging_loss=0.008808, over 3037671.87 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:43:33,774 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2023-11-26 04:43:40,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3224826.6666666665, ans=0.1 2023-11-26 04:43:43,678 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.913e+01 9.370e+01 9.874e+01 1.312e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 04:43:50,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-11-26 04:43:55,854 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:43:57,986 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483750 2023-11-26 04:44:04,796 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2800, loss[loss=0.06004, simple_loss=0.07672, pruned_loss=0.009565, audio_tagging_loss=0.01211, over 14903.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08954, pruned_loss=0.01256, audio_tagging_loss=0.008767, over 3036628.07 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:44:17,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3225093.3333333335, ans=0.0 2023-11-26 04:44:20,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3225093.3333333335, ans=0.125 2023-11-26 04:44:21,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3225093.3333333335, ans=0.0 2023-11-26 04:44:31,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2023-11-26 04:44:36,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3225160.0, ans=0.125 2023-11-26 04:44:39,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-26 04:44:39,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3225226.6666666665, ans=0.1 2023-11-26 04:44:52,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3225293.3333333335, ans=0.0 2023-11-26 04:44:55,111 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483800 2023-11-26 04:45:01,805 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2850, loss[loss=0.06482, simple_loss=0.0877, pruned_loss=0.01366, audio_tagging_loss=0.007311, over 15240.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0894, pruned_loss=0.0126, audio_tagging_loss=0.008746, over 3037604.98 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:45:16,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3225426.6666666665, ans=0.125 2023-11-26 04:45:18,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3225426.6666666665, ans=0.125 2023-11-26 04:45:36,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.846e+01 9.347e+01 1.008e+02 1.244e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 04:45:37,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3225560.0, ans=0.07 2023-11-26 04:45:50,784 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483850 2023-11-26 04:45:56,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3225693.3333333335, ans=0.125 2023-11-26 04:45:57,158 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2900, loss[loss=0.05304, simple_loss=0.07008, pruned_loss=0.009433, audio_tagging_loss=0.008565, over 14333.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08908, pruned_loss=0.0125, audio_tagging_loss=0.008735, over 3038717.08 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:46:17,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3225760.0, ans=0.0 2023-11-26 04:46:17,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3225760.0, ans=0.1 2023-11-26 04:46:37,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=8.0 2023-11-26 04:46:46,763 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483900 2023-11-26 04:46:52,984 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2950, loss[loss=0.07989, simple_loss=0.1044, pruned_loss=0.01325, audio_tagging_loss=0.01443, over 15919.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09004, pruned_loss=0.0126, audio_tagging_loss=0.008744, over 3034583.63 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:47:04,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3226093.3333333335, ans=0.0 2023-11-26 04:47:08,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3226093.3333333335, ans=0.0 2023-11-26 04:47:09,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3226093.3333333335, ans=0.125 2023-11-26 04:47:11,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3226093.3333333335, ans=0.125 2023-11-26 04:47:16,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3226160.0, ans=0.0 2023-11-26 04:47:27,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-11-26 04:47:27,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.828e+01 9.406e+01 1.023e+02 1.338e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 04:47:29,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3226226.6666666665, ans=0.0 2023-11-26 04:47:34,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3226226.6666666665, ans=0.125 2023-11-26 04:47:42,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483950 2023-11-26 04:47:49,766 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3000, loss[loss=0.09262, simple_loss=0.1246, pruned_loss=0.02338, audio_tagging_loss=0.006951, over 16385.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09086, pruned_loss=0.01282, audio_tagging_loss=0.008706, over 3042505.75 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:47:49,767 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 04:48:22,232 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05755, simple_loss=0.05064, pruned_loss=0.005227, audio_tagging_loss=0.02701, over 4681554.00 frames. 2023-11-26 04:48:22,233 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 04:48:23,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3226360.0, ans=0.2 2023-11-26 04:49:08,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3226626.6666666665, ans=0.125 2023-11-26 04:49:11,256 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484000 2023-11-26 04:49:18,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3226626.6666666665, ans=0.0 2023-11-26 04:49:20,474 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3050, loss[loss=0.09888, simple_loss=0.1369, pruned_loss=0.02216, audio_tagging_loss=0.008262, over 16810.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09211, pruned_loss=0.01308, audio_tagging_loss=0.008805, over 3041474.75 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:49:52,013 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:49:55,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.733e+01 9.255e+01 1.004e+02 1.259e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 04:49:55,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3226893.3333333335, ans=0.035 2023-11-26 04:49:59,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3226893.3333333335, ans=0.1 2023-11-26 04:50:10,305 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484050 2023-11-26 04:50:17,060 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3100, loss[loss=0.06412, simple_loss=0.0906, pruned_loss=0.01138, audio_tagging_loss=0.007439, over 14772.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09121, pruned_loss=0.01295, audio_tagging_loss=0.008868, over 3037994.55 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:50:17,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3227026.6666666665, ans=0.125 2023-11-26 04:50:34,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3227093.3333333335, ans=0.125 2023-11-26 04:50:48,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3227160.0, ans=0.5 2023-11-26 04:51:06,107 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484100 2023-11-26 04:51:12,456 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3150, loss[loss=0.04365, simple_loss=0.06134, pruned_loss=0.004508, audio_tagging_loss=0.008468, over 14315.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09167, pruned_loss=0.01284, audio_tagging_loss=0.008862, over 3041094.35 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:51:29,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2023-11-26 04:51:35,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3227493.3333333335, ans=0.025 2023-11-26 04:51:37,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3227493.3333333335, ans=0.025 2023-11-26 04:51:42,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3227493.3333333335, ans=0.0 2023-11-26 04:51:48,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.901e+01 8.686e+01 9.278e+01 1.012e+02 1.304e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 04:51:54,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3227560.0, ans=0.125 2023-11-26 04:52:01,920 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484150 2023-11-26 04:52:08,309 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3200, loss[loss=0.05749, simple_loss=0.07375, pruned_loss=0.01216, audio_tagging_loss=0.008456, over 14545.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.0923, pruned_loss=0.01283, audio_tagging_loss=0.008898, over 3043516.35 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:52:09,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3227693.3333333335, ans=0.125 2023-11-26 04:52:13,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3227693.3333333335, ans=0.125 2023-11-26 04:52:29,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3227760.0, ans=15.0 2023-11-26 04:52:49,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3227893.3333333335, ans=0.125 2023-11-26 04:52:57,681 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484200 2023-11-26 04:52:59,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3227960.0, ans=0.125 2023-11-26 04:53:04,803 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3250, loss[loss=0.08333, simple_loss=0.114, pruned_loss=0.01589, audio_tagging_loss=0.01042, over 15986.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09171, pruned_loss=0.01281, audio_tagging_loss=0.008974, over 3045815.22 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:53:19,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3228093.3333333335, ans=0.125 2023-11-26 04:53:24,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3228093.3333333335, ans=0.125 2023-11-26 04:53:36,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3228160.0, ans=0.125 2023-11-26 04:53:40,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.676e+01 9.295e+01 9.800e+01 1.223e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 04:53:48,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3228293.3333333335, ans=0.1 2023-11-26 04:53:54,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484250 2023-11-26 04:54:00,679 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3300, loss[loss=0.0559, simple_loss=0.07375, pruned_loss=0.00914, audio_tagging_loss=0.009885, over 15637.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09127, pruned_loss=0.01261, audio_tagging_loss=0.00905, over 3050643.52 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:54:02,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3228360.0, ans=0.125 2023-11-26 04:54:08,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2023-11-26 04:54:21,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3228426.6666666665, ans=0.0 2023-11-26 04:54:24,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.34 vs. limit=10.0 2023-11-26 04:54:24,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3228493.3333333335, ans=0.125 2023-11-26 04:54:26,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2023-11-26 04:54:34,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3228560.0, ans=0.125 2023-11-26 04:54:38,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.76 vs. limit=10.0 2023-11-26 04:54:50,297 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484300 2023-11-26 04:54:51,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.22 vs. limit=15.0 2023-11-26 04:54:55,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3228693.3333333335, ans=0.1 2023-11-26 04:54:56,670 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3350, loss[loss=0.09496, simple_loss=0.1336, pruned_loss=0.01919, audio_tagging_loss=0.00896, over 16049.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09106, pruned_loss=0.01263, audio_tagging_loss=0.008935, over 3044201.06 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:54:59,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3228693.3333333335, ans=0.125 2023-11-26 04:55:11,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-11-26 04:55:22,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3228826.6666666665, ans=0.125 2023-11-26 04:55:32,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.894e+01 9.635e+01 1.028e+02 1.225e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 04:55:32,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3228893.3333333335, ans=0.2 2023-11-26 04:55:32,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3228893.3333333335, ans=0.125 2023-11-26 04:55:35,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3228893.3333333335, ans=0.0 2023-11-26 04:55:46,122 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484350 2023-11-26 04:55:52,833 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3400, loss[loss=0.07977, simple_loss=0.1099, pruned_loss=0.01454, audio_tagging_loss=0.01029, over 15575.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09099, pruned_loss=0.01263, audio_tagging_loss=0.008875, over 3047861.43 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:56:07,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3229093.3333333335, ans=0.2 2023-11-26 04:56:16,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-26 04:56:41,979 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484400 2023-11-26 04:56:49,092 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3450, loss[loss=0.06513, simple_loss=0.08564, pruned_loss=0.01596, audio_tagging_loss=0.006346, over 14579.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09071, pruned_loss=0.01268, audio_tagging_loss=0.008841, over 3050559.76 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:56:57,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-26 04:57:02,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3229426.6666666665, ans=0.0 2023-11-26 04:57:02,971 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-11-26 04:57:07,265 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.39 vs. limit=6.0 2023-11-26 04:57:15,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3229493.3333333335, ans=0.5 2023-11-26 04:57:24,870 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.632e+01 9.209e+01 1.007e+02 1.265e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:57:38,931 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484450 2023-11-26 04:57:45,200 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3500, loss[loss=0.07498, simple_loss=0.1123, pruned_loss=0.01068, audio_tagging_loss=0.008144, over 15042.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09033, pruned_loss=0.0125, audio_tagging_loss=0.008812, over 3047982.61 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:57:48,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-26 04:58:12,786 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:58:16,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3229826.6666666665, ans=0.2 2023-11-26 04:58:23,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3229893.3333333335, ans=0.0 2023-11-26 04:58:30,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3229960.0, ans=0.1 2023-11-26 04:58:31,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3229960.0, ans=0.125 2023-11-26 04:58:33,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3229960.0, ans=0.125 2023-11-26 04:58:34,880 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484500 2023-11-26 04:58:37,445 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2023-11-26 04:58:41,685 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3550, loss[loss=0.083, simple_loss=0.09992, pruned_loss=0.02148, audio_tagging_loss=0.01156, over 14818.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08937, pruned_loss=0.01238, audio_tagging_loss=0.008795, over 3046762.76 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:58:41,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3230026.6666666665, ans=0.0 2023-11-26 04:58:46,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3230026.6666666665, ans=0.1 2023-11-26 04:58:55,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3230093.3333333335, ans=0.0 2023-11-26 04:59:02,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3230160.0, ans=0.125 2023-11-26 04:59:10,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-26 04:59:17,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230226.6666666665, ans=0.1 2023-11-26 04:59:18,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.467e+01 9.253e+01 9.852e+01 1.320e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 04:59:27,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3230293.3333333335, ans=0.1 2023-11-26 04:59:30,882 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484550 2023-11-26 04:59:32,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3230293.3333333335, ans=0.125 2023-11-26 04:59:36,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2023-11-26 04:59:37,116 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3600, loss[loss=0.05808, simple_loss=0.07883, pruned_loss=0.007356, audio_tagging_loss=0.01132, over 15056.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08967, pruned_loss=0.01251, audio_tagging_loss=0.008751, over 3041509.97 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:59:37,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3230360.0, ans=0.2 2023-11-26 05:00:02,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3230493.3333333335, ans=0.0 2023-11-26 05:00:09,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3230493.3333333335, ans=0.0 2023-11-26 05:00:11,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3230560.0, ans=0.0 2023-11-26 05:00:14,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3230560.0, ans=0.0 2023-11-26 05:00:24,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3230626.6666666665, ans=0.0 2023-11-26 05:00:25,948 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484600 2023-11-26 05:00:30,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3230626.6666666665, ans=0.0 2023-11-26 05:00:32,958 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3650, loss[loss=0.07724, simple_loss=0.1029, pruned_loss=0.01773, audio_tagging_loss=0.008066, over 15716.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08982, pruned_loss=0.01261, audio_tagging_loss=0.008725, over 3052075.84 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:00:34,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=12.0 2023-11-26 05:00:35,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3230693.3333333335, ans=0.1 2023-11-26 05:00:37,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230693.3333333335, ans=0.1 2023-11-26 05:00:42,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3230693.3333333335, ans=0.1 2023-11-26 05:01:01,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3230826.6666666665, ans=0.0 2023-11-26 05:01:08,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.921e+01 9.497e+01 1.030e+02 1.167e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 05:01:21,802 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484650 2023-11-26 05:01:28,664 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3700, loss[loss=0.08011, simple_loss=0.1072, pruned_loss=0.01698, audio_tagging_loss=0.00953, over 14756.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09017, pruned_loss=0.01267, audio_tagging_loss=0.008695, over 3052911.86 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:02:18,217 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484700 2023-11-26 05:02:24,648 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3750, loss[loss=0.05727, simple_loss=0.07823, pruned_loss=0.009661, audio_tagging_loss=0.008493, over 14876.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09084, pruned_loss=0.01285, audio_tagging_loss=0.008673, over 3057310.63 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:02:38,165 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:02:41,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2023-11-26 05:03:02,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.844e+01 9.429e+01 1.038e+02 1.452e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 05:03:02,858 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:03:04,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3231560.0, ans=0.125 2023-11-26 05:03:07,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3231560.0, ans=0.125 2023-11-26 05:03:08,247 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:03:13,410 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484750 2023-11-26 05:03:20,217 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3800, loss[loss=0.07434, simple_loss=0.104, pruned_loss=0.01385, audio_tagging_loss=0.008473, over 15860.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09192, pruned_loss=0.01297, audio_tagging_loss=0.008688, over 3052307.27 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:03:29,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3231693.3333333335, ans=0.125 2023-11-26 05:03:30,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3231760.0, ans=0.1 2023-11-26 05:03:53,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3231893.3333333335, ans=0.0 2023-11-26 05:03:55,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2023-11-26 05:03:57,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2023-11-26 05:04:09,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484800 2023-11-26 05:04:14,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3231960.0, ans=0.1 2023-11-26 05:04:16,613 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3850, loss[loss=0.05148, simple_loss=0.07494, pruned_loss=0.00778, audio_tagging_loss=0.00623, over 14747.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09171, pruned_loss=0.01276, audio_tagging_loss=0.008721, over 3050468.30 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:04:22,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3232026.6666666665, ans=0.125 2023-11-26 05:04:54,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.842e+01 9.367e+01 1.019e+02 1.484e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 05:04:54,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3232226.6666666665, ans=0.0 2023-11-26 05:05:05,896 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484850 2023-11-26 05:05:12,185 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3900, loss[loss=0.06393, simple_loss=0.08998, pruned_loss=0.01168, audio_tagging_loss=0.00726, over 14780.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09148, pruned_loss=0.01248, audio_tagging_loss=0.008798, over 3039926.53 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:05:12,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3232360.0, ans=0.0 2023-11-26 05:05:15,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3232360.0, ans=0.125 2023-11-26 05:05:24,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3232426.6666666665, ans=0.1 2023-11-26 05:05:25,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3232426.6666666665, ans=0.07 2023-11-26 05:05:37,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3232493.3333333335, ans=0.0 2023-11-26 05:05:48,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-26 05:06:01,426 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484900 2023-11-26 05:06:06,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3232693.3333333335, ans=0.0 2023-11-26 05:06:07,671 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3950, loss[loss=0.07178, simple_loss=0.0955, pruned_loss=0.01625, audio_tagging_loss=0.007778, over 15252.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09185, pruned_loss=0.01254, audio_tagging_loss=0.008789, over 3044398.87 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:06:12,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3232693.3333333335, ans=0.2 2023-11-26 05:06:16,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3232693.3333333335, ans=0.125 2023-11-26 05:06:17,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3232760.0, ans=0.125 2023-11-26 05:06:31,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3232826.6666666665, ans=0.0 2023-11-26 05:06:37,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3232826.6666666665, ans=0.125 2023-11-26 05:06:38,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3232826.6666666665, ans=0.1 2023-11-26 05:06:42,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2023-11-26 05:06:45,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.918e+01 9.453e+01 1.012e+02 1.260e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 05:06:57,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484950 2023-11-26 05:07:04,059 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4000, loss[loss=0.06298, simple_loss=0.07918, pruned_loss=0.01006, audio_tagging_loss=0.01333, over 14992.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09062, pruned_loss=0.01242, audio_tagging_loss=0.009082, over 3038479.88 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:07:08,589 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2023-11-26 05:07:36,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3233226.6666666665, ans=0.2 2023-11-26 05:07:41,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3233226.6666666665, ans=0.125 2023-11-26 05:07:44,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3233226.6666666665, ans=0.0 2023-11-26 05:07:54,531 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485000 2023-11-26 05:08:01,218 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4050, loss[loss=0.06487, simple_loss=0.09654, pruned_loss=0.01162, audio_tagging_loss=0.004986, over 14934.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09061, pruned_loss=0.0125, audio_tagging_loss=0.00907, over 3047990.78 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:08:03,381 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:08:03,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3233360.0, ans=0.125 2023-11-26 05:08:15,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3233426.6666666665, ans=0.125 2023-11-26 05:08:18,302 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2023-11-26 05:08:24,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3233493.3333333335, ans=0.1 2023-11-26 05:08:27,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-26 05:08:32,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3233493.3333333335, ans=0.0 2023-11-26 05:08:38,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.932e+01 9.380e+01 1.022e+02 1.358e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 05:08:42,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3233560.0, ans=0.2 2023-11-26 05:08:47,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3233626.6666666665, ans=0.125 2023-11-26 05:08:49,761 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485050 2023-11-26 05:08:49,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3233626.6666666665, ans=0.0 2023-11-26 05:08:56,101 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4100, loss[loss=0.06342, simple_loss=0.08519, pruned_loss=0.01286, audio_tagging_loss=0.007965, over 15790.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09157, pruned_loss=0.0127, audio_tagging_loss=0.009038, over 3047306.21 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:09:16,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3233760.0, ans=0.0 2023-11-26 05:09:30,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3233893.3333333335, ans=0.125 2023-11-26 05:09:31,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3233893.3333333335, ans=0.0 2023-11-26 05:09:39,030 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-26 05:09:41,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-26 05:09:44,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3233960.0, ans=0.0 2023-11-26 05:09:45,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.54 vs. limit=15.0 2023-11-26 05:09:45,638 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485100 2023-11-26 05:09:49,231 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=12.0 2023-11-26 05:09:51,891 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4150, loss[loss=0.06294, simple_loss=0.07989, pruned_loss=0.014, audio_tagging_loss=0.008986, over 14870.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09139, pruned_loss=0.01283, audio_tagging_loss=0.008934, over 3050418.90 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:09:53,192 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:10:11,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3234093.3333333335, ans=0.125 2023-11-26 05:10:30,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.624e+01 9.472e+01 1.019e+02 1.478e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 05:10:30,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.31 vs. limit=6.0 2023-11-26 05:10:32,250 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:10:40,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3234293.3333333335, ans=0.2 2023-11-26 05:10:41,379 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485150 2023-11-26 05:10:44,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3234293.3333333335, ans=0.0 2023-11-26 05:10:45,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3234293.3333333335, ans=0.2 2023-11-26 05:10:48,116 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4200, loss[loss=0.0599, simple_loss=0.07479, pruned_loss=0.01095, audio_tagging_loss=0.01156, over 15730.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09066, pruned_loss=0.01264, audio_tagging_loss=0.008865, over 3048719.91 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:10:50,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.91 vs. limit=10.0 2023-11-26 05:10:56,810 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:10:57,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3234426.6666666665, ans=0.2 2023-11-26 05:11:02,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3234426.6666666665, ans=0.0 2023-11-26 05:11:11,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3234493.3333333335, ans=0.125 2023-11-26 05:11:13,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3234493.3333333335, ans=0.125 2023-11-26 05:11:17,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3234493.3333333335, ans=0.0 2023-11-26 05:11:30,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3234560.0, ans=0.0 2023-11-26 05:11:37,182 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485200 2023-11-26 05:11:37,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3234626.6666666665, ans=0.125 2023-11-26 05:11:43,691 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4250, loss[loss=0.0506, simple_loss=0.05784, pruned_loss=0.009718, audio_tagging_loss=0.01196, over 16413.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09237, pruned_loss=0.01292, audio_tagging_loss=0.008665, over 3053683.98 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:11:44,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3234693.3333333335, ans=0.125 2023-11-26 05:11:46,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-11-26 05:12:13,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-11-26 05:12:16,702 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:12:16,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3234893.3333333335, ans=0.125 2023-11-26 05:12:17,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3234893.3333333335, ans=0.125 2023-11-26 05:12:21,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.763e+01 9.377e+01 1.004e+02 4.197e+02, threshold=1.875e+02, percent-clipped=1.0 2023-11-26 05:12:22,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3234893.3333333335, ans=0.1 2023-11-26 05:12:29,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3234960.0, ans=0.125 2023-11-26 05:12:33,020 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485250 2023-11-26 05:12:34,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-26 05:12:39,364 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4300, loss[loss=0.05822, simple_loss=0.07837, pruned_loss=0.008981, audio_tagging_loss=0.01005, over 16170.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.0923, pruned_loss=0.01283, audio_tagging_loss=0.008667, over 3052442.07 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:12:40,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3235026.6666666665, ans=0.0 2023-11-26 05:12:48,095 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-11-26 05:12:56,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3235093.3333333335, ans=0.125 2023-11-26 05:13:02,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2023-11-26 05:13:04,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3235160.0, ans=0.0 2023-11-26 05:13:19,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3235226.6666666665, ans=0.0 2023-11-26 05:13:27,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3235293.3333333335, ans=0.2 2023-11-26 05:13:28,995 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485300 2023-11-26 05:13:35,803 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4350, loss[loss=0.07004, simple_loss=0.1012, pruned_loss=0.01164, audio_tagging_loss=0.007822, over 15226.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09236, pruned_loss=0.01293, audio_tagging_loss=0.008705, over 3049408.06 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:13:35,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3235360.0, ans=0.0 2023-11-26 05:13:37,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3235360.0, ans=0.2 2023-11-26 05:13:45,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3235360.0, ans=0.0 2023-11-26 05:13:49,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2023-11-26 05:14:04,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3235493.3333333335, ans=0.125 2023-11-26 05:14:14,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.639e+01 9.414e+01 1.000e+02 1.262e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 05:14:25,131 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485350 2023-11-26 05:14:31,451 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4400, loss[loss=0.09321, simple_loss=0.1257, pruned_loss=0.02287, audio_tagging_loss=0.007469, over 15286.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09123, pruned_loss=0.0127, audio_tagging_loss=0.008801, over 3046387.72 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:14:31,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3235693.3333333335, ans=0.0 2023-11-26 05:14:51,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=22.5 2023-11-26 05:14:51,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2023-11-26 05:14:53,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3235826.6666666665, ans=0.0 2023-11-26 05:15:19,902 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485400 2023-11-26 05:15:21,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3235960.0, ans=0.1 2023-11-26 05:15:27,072 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4450, loss[loss=0.04052, simple_loss=0.04695, pruned_loss=0.006237, audio_tagging_loss=0.0108, over 14717.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09125, pruned_loss=0.01279, audio_tagging_loss=0.008785, over 3050204.56 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:15:42,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3236093.3333333335, ans=0.05 2023-11-26 05:15:42,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2023-11-26 05:15:57,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3236160.0, ans=0.125 2023-11-26 05:15:58,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3236160.0, ans=0.0 2023-11-26 05:15:59,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3236226.6666666665, ans=0.0 2023-11-26 05:16:05,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3236226.6666666665, ans=0.2 2023-11-26 05:16:06,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.912e+01 9.547e+01 1.021e+02 1.319e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 05:16:16,075 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485450 2023-11-26 05:16:23,543 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4500, loss[loss=0.03848, simple_loss=0.05042, pruned_loss=0.004975, audio_tagging_loss=0.008292, over 15126.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09002, pruned_loss=0.01254, audio_tagging_loss=0.008855, over 3044623.44 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:16:48,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-26 05:17:12,566 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485500 2023-11-26 05:17:15,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3236626.6666666665, ans=0.0 2023-11-26 05:17:15,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3236626.6666666665, ans=0.125 2023-11-26 05:17:18,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3236693.3333333335, ans=0.04949747468305833 2023-11-26 05:17:19,340 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4550, loss[loss=0.06167, simple_loss=0.08448, pruned_loss=0.01156, audio_tagging_loss=0.007865, over 13749.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08979, pruned_loss=0.01255, audio_tagging_loss=0.008828, over 3037358.96 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:17:31,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3236760.0, ans=0.1 2023-11-26 05:17:35,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3236760.0, ans=0.1 2023-11-26 05:17:40,633 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2023-11-26 05:17:46,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-26 05:17:48,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3236826.6666666665, ans=0.125 2023-11-26 05:17:50,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3236826.6666666665, ans=0.125 2023-11-26 05:17:57,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.528e+01 9.112e+01 9.671e+01 1.236e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 05:17:58,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3236893.3333333335, ans=0.125 2023-11-26 05:18:00,022 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:18:08,220 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485550 2023-11-26 05:18:15,111 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4600, loss[loss=0.06495, simple_loss=0.09509, pruned_loss=0.00897, audio_tagging_loss=0.008437, over 15419.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08946, pruned_loss=0.01259, audio_tagging_loss=0.008828, over 3041972.65 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:18:46,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2023-11-26 05:19:03,804 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485600 2023-11-26 05:19:06,525 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-26 05:19:10,778 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4650, loss[loss=0.06557, simple_loss=0.08915, pruned_loss=0.01214, audio_tagging_loss=0.008847, over 14848.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09036, pruned_loss=0.01273, audio_tagging_loss=0.008847, over 3044719.73 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:19:18,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-26 05:19:19,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3237360.0, ans=0.0 2023-11-26 05:19:33,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3237493.3333333335, ans=0.125 2023-11-26 05:19:36,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3237493.3333333335, ans=0.1 2023-11-26 05:19:40,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3237493.3333333335, ans=0.125 2023-11-26 05:19:42,427 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=22.5 2023-11-26 05:19:45,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3237560.0, ans=0.125 2023-11-26 05:19:51,904 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.706e+01 9.399e+01 1.022e+02 1.601e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 05:19:55,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3237626.6666666665, ans=0.0 2023-11-26 05:20:00,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485650 2023-11-26 05:20:06,204 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4700, loss[loss=0.06153, simple_loss=0.07959, pruned_loss=0.0121, audio_tagging_loss=0.009638, over 14832.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.0907, pruned_loss=0.01274, audio_tagging_loss=0.008879, over 3055531.36 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:20:06,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3237693.3333333335, ans=0.025 2023-11-26 05:20:14,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-11-26 05:20:18,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3237760.0, ans=0.125 2023-11-26 05:20:54,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485700 2023-11-26 05:20:59,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-11-26 05:21:02,120 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4750, loss[loss=0.08129, simple_loss=0.1031, pruned_loss=0.01946, audio_tagging_loss=0.01029, over 13703.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09036, pruned_loss=0.01275, audio_tagging_loss=0.008909, over 3062072.58 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:21:03,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3238026.6666666665, ans=0.125 2023-11-26 05:21:14,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3238093.3333333335, ans=0.1 2023-11-26 05:21:36,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2023-11-26 05:21:42,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.672e+01 9.207e+01 9.886e+01 1.229e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 05:21:50,944 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485750 2023-11-26 05:21:57,727 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4800, loss[loss=0.06158, simple_loss=0.08575, pruned_loss=0.009472, audio_tagging_loss=0.009228, over 16275.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09006, pruned_loss=0.0126, audio_tagging_loss=0.009028, over 3057360.22 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:21:57,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3238360.0, ans=0.125 2023-11-26 05:22:28,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3238493.3333333335, ans=0.125 2023-11-26 05:22:46,892 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485800 2023-11-26 05:22:48,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-11-26 05:22:53,483 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4850, loss[loss=0.07458, simple_loss=0.1043, pruned_loss=0.01308, audio_tagging_loss=0.009351, over 14844.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08946, pruned_loss=0.01253, audio_tagging_loss=0.009179, over 3056677.10 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:22:53,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3238693.3333333335, ans=0.04949747468305833 2023-11-26 05:23:05,452 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-26 05:23:09,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3238760.0, ans=0.04949747468305833 2023-11-26 05:23:10,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3238760.0, ans=0.125 2023-11-26 05:23:33,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3238893.3333333335, ans=0.125 2023-11-26 05:23:35,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.610e+01 9.289e+01 1.009e+02 1.598e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 05:23:41,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485850 2023-11-26 05:23:46,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-26 05:23:48,248 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4900, loss[loss=0.07338, simple_loss=0.09933, pruned_loss=0.01238, audio_tagging_loss=0.01134, over 15349.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08992, pruned_loss=0.01247, audio_tagging_loss=0.009021, over 3055726.07 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:23:49,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3239026.6666666665, ans=0.0 2023-11-26 05:23:56,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3239026.6666666665, ans=0.0 2023-11-26 05:24:08,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3239093.3333333335, ans=0.0 2023-11-26 05:24:11,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3239160.0, ans=0.125 2023-11-26 05:24:14,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3239160.0, ans=0.0 2023-11-26 05:24:19,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2023-11-26 05:24:20,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-11-26 05:24:25,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3239226.6666666665, ans=0.1 2023-11-26 05:24:37,338 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485900 2023-11-26 05:24:43,567 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4950, loss[loss=0.05293, simple_loss=0.07171, pruned_loss=0.00582, audio_tagging_loss=0.01125, over 15921.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09001, pruned_loss=0.01246, audio_tagging_loss=0.008847, over 3057558.56 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:24:45,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3239360.0, ans=0.125 2023-11-26 05:24:50,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3239360.0, ans=0.1 2023-11-26 05:24:51,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3239360.0, ans=0.125 2023-11-26 05:25:07,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3239493.3333333335, ans=0.0 2023-11-26 05:25:25,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3239560.0, ans=0.125 2023-11-26 05:25:25,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.689e+01 9.233e+01 9.794e+01 1.211e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 05:25:33,294 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485950 2023-11-26 05:25:39,614 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5000, loss[loss=0.07025, simple_loss=0.1004, pruned_loss=0.0131, audio_tagging_loss=0.006952, over 15597.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09007, pruned_loss=0.01252, audio_tagging_loss=0.008689, over 3047276.02 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:26:10,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3239826.6666666665, ans=0.125 2023-11-26 05:26:28,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486000 2023-11-26 05:26:28,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3239960.0, ans=0.035 2023-11-26 05:26:29,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3239960.0, ans=0.125 2023-11-26 05:26:32,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3239960.0, ans=0.0 2023-11-26 05:26:34,635 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5050, loss[loss=0.05614, simple_loss=0.06685, pruned_loss=0.01205, audio_tagging_loss=0.01066, over 14936.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09011, pruned_loss=0.01261, audio_tagging_loss=0.008667, over 3044108.58 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:26:42,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3240026.6666666665, ans=0.1 2023-11-26 05:26:53,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2023-11-26 05:26:55,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=12.0 2023-11-26 05:26:59,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3240160.0, ans=0.0 2023-11-26 05:27:16,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 8.723e+01 9.210e+01 1.029e+02 1.181e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 05:27:18,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2023-11-26 05:27:23,666 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486050 2023-11-26 05:27:30,370 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5100, loss[loss=0.0855, simple_loss=0.1119, pruned_loss=0.02047, audio_tagging_loss=0.00909, over 14127.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08986, pruned_loss=0.0126, audio_tagging_loss=0.008693, over 3045664.73 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:27:42,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3240426.6666666665, ans=0.125 2023-11-26 05:28:09,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-11-26 05:28:19,379 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486100 2023-11-26 05:28:26,091 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5150, loss[loss=0.0427, simple_loss=0.05219, pruned_loss=0.006482, audio_tagging_loss=0.01013, over 15781.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08952, pruned_loss=0.01257, audio_tagging_loss=0.008746, over 3046298.30 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:28:51,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3240826.6666666665, ans=0.1 2023-11-26 05:29:08,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.813e+01 9.450e+01 1.017e+02 1.282e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 05:29:12,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3240960.0, ans=0.125 2023-11-26 05:29:14,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486150 2023-11-26 05:29:19,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3240960.0, ans=0.5 2023-11-26 05:29:21,072 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5200, loss[loss=0.06679, simple_loss=0.09474, pruned_loss=0.01267, audio_tagging_loss=0.006743, over 15972.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08963, pruned_loss=0.01257, audio_tagging_loss=0.008802, over 3046675.90 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:29:26,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3241026.6666666665, ans=0.125 2023-11-26 05:29:33,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2023-11-26 05:29:45,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3241160.0, ans=0.05 2023-11-26 05:29:55,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3241226.6666666665, ans=0.1 2023-11-26 05:30:10,294 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486200 2023-11-26 05:30:16,780 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5250, loss[loss=0.05046, simple_loss=0.06932, pruned_loss=0.008644, audio_tagging_loss=0.007149, over 15542.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08937, pruned_loss=0.01252, audio_tagging_loss=0.008704, over 3046165.51 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:30:27,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3241426.6666666665, ans=10.0 2023-11-26 05:30:32,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3241426.6666666665, ans=0.07 2023-11-26 05:30:44,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3241493.3333333335, ans=0.125 2023-11-26 05:30:56,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3241560.0, ans=0.0 2023-11-26 05:30:58,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.725e+01 9.409e+01 1.008e+02 1.630e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 05:31:05,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486250 2023-11-26 05:31:07,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3241626.6666666665, ans=0.125 2023-11-26 05:31:08,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3241626.6666666665, ans=0.1 2023-11-26 05:31:13,296 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5300, loss[loss=0.08175, simple_loss=0.1165, pruned_loss=0.01752, audio_tagging_loss=0.006008, over 15915.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08925, pruned_loss=0.01234, audio_tagging_loss=0.008726, over 3051439.41 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:31:30,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3241760.0, ans=0.07 2023-11-26 05:31:31,729 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=12.0 2023-11-26 05:31:33,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3241826.6666666665, ans=0.1 2023-11-26 05:31:36,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3241826.6666666665, ans=0.125 2023-11-26 05:31:54,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3241893.3333333335, ans=0.1 2023-11-26 05:31:59,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3241960.0, ans=0.125 2023-11-26 05:32:01,933 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486300 2023-11-26 05:32:07,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3242026.6666666665, ans=0.125 2023-11-26 05:32:08,103 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5350, loss[loss=0.05546, simple_loss=0.06967, pruned_loss=0.008961, audio_tagging_loss=0.01167, over 15244.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08995, pruned_loss=0.01231, audio_tagging_loss=0.008702, over 3056547.00 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:32:49,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.485e+01 9.147e+01 9.991e+01 1.214e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-26 05:32:50,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3242226.6666666665, ans=0.1 2023-11-26 05:32:56,336 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486350 2023-11-26 05:33:03,201 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5400, loss[loss=0.0574, simple_loss=0.07324, pruned_loss=0.01157, audio_tagging_loss=0.009215, over 15075.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09043, pruned_loss=0.01236, audio_tagging_loss=0.008711, over 3055844.57 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:33:12,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3242360.0, ans=0.2 2023-11-26 05:33:28,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3242493.3333333335, ans=0.125 2023-11-26 05:33:34,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3242493.3333333335, ans=0.125 2023-11-26 05:33:51,799 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486400 2023-11-26 05:33:59,176 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5450, loss[loss=0.05757, simple_loss=0.07509, pruned_loss=0.01129, audio_tagging_loss=0.008745, over 14461.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09064, pruned_loss=0.01247, audio_tagging_loss=0.008687, over 3048659.04 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:34:05,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3242693.3333333335, ans=0.125 2023-11-26 05:34:12,879 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.10 vs. limit=10.0 2023-11-26 05:34:28,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3242826.6666666665, ans=0.1 2023-11-26 05:34:30,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3242893.3333333335, ans=10.0 2023-11-26 05:34:41,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.605e+01 9.179e+01 9.906e+01 1.952e+02, threshold=1.836e+02, percent-clipped=1.0 2023-11-26 05:34:48,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486450 2023-11-26 05:34:49,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3242960.0, ans=0.0 2023-11-26 05:34:49,649 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2023-11-26 05:34:54,501 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5500, loss[loss=0.05743, simple_loss=0.07925, pruned_loss=0.008392, audio_tagging_loss=0.009417, over 15853.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08977, pruned_loss=0.01226, audio_tagging_loss=0.008763, over 3044059.97 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:34:54,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3243026.6666666665, ans=0.125 2023-11-26 05:35:04,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-11-26 05:35:36,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3243226.6666666665, ans=10.0 2023-11-26 05:35:40,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-11-26 05:35:42,918 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486500 2023-11-26 05:35:49,791 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5550, loss[loss=0.09074, simple_loss=0.1315, pruned_loss=0.01801, audio_tagging_loss=0.006978, over 15616.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08992, pruned_loss=0.01233, audio_tagging_loss=0.00883, over 3043289.91 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:35:51,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-26 05:35:55,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3243360.0, ans=0.125 2023-11-26 05:35:58,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-26 05:36:04,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3243426.6666666665, ans=0.1 2023-11-26 05:36:09,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3243426.6666666665, ans=0.2 2023-11-26 05:36:11,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243493.3333333335, ans=0.1 2023-11-26 05:36:12,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3243493.3333333335, ans=0.125 2023-11-26 05:36:32,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.745e+01 9.267e+01 1.002e+02 1.641e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 05:36:34,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3243626.6666666665, ans=0.1 2023-11-26 05:36:38,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486550 2023-11-26 05:36:45,267 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5600, loss[loss=0.08438, simple_loss=0.1106, pruned_loss=0.01977, audio_tagging_loss=0.00933, over 14971.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09049, pruned_loss=0.01249, audio_tagging_loss=0.008969, over 3042507.01 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:36:53,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3243693.3333333335, ans=0.2 2023-11-26 05:37:22,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243893.3333333335, ans=0.1 2023-11-26 05:37:23,716 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:37:33,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3243960.0, ans=0.125 2023-11-26 05:37:34,268 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486600 2023-11-26 05:37:36,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3243960.0, ans=0.0 2023-11-26 05:37:37,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3243960.0, ans=0.2 2023-11-26 05:37:40,742 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5650, loss[loss=0.04864, simple_loss=0.05913, pruned_loss=0.006981, audio_tagging_loss=0.0121, over 14871.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08971, pruned_loss=0.01234, audio_tagging_loss=0.009145, over 3047214.56 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:38:05,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3244160.0, ans=0.0 2023-11-26 05:38:21,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-26 05:38:23,773 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.720e+01 9.280e+01 9.877e+01 1.364e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 05:38:28,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-11-26 05:38:29,535 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486650 2023-11-26 05:38:35,766 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5700, loss[loss=0.06134, simple_loss=0.08157, pruned_loss=0.01216, audio_tagging_loss=0.008399, over 14776.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08917, pruned_loss=0.0123, audio_tagging_loss=0.009151, over 3045653.71 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:38:41,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3244360.0, ans=0.0 2023-11-26 05:38:45,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2023-11-26 05:38:53,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3244426.6666666665, ans=0.1 2023-11-26 05:39:02,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3244493.3333333335, ans=0.0 2023-11-26 05:39:08,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3244560.0, ans=0.125 2023-11-26 05:39:24,736 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486700 2023-11-26 05:39:31,501 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5750, loss[loss=0.0869, simple_loss=0.1129, pruned_loss=0.02112, audio_tagging_loss=0.009317, over 15206.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08845, pruned_loss=0.01211, audio_tagging_loss=0.009049, over 3046750.81 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:39:31,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3244693.3333333335, ans=0.09899494936611666 2023-11-26 05:39:47,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3244760.0, ans=0.2 2023-11-26 05:39:48,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3244760.0, ans=0.2 2023-11-26 05:39:59,060 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:40:15,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.613e+01 9.170e+01 1.044e+02 1.478e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-26 05:40:17,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3244960.0, ans=0.125 2023-11-26 05:40:20,594 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486750 2023-11-26 05:40:26,821 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5800, loss[loss=0.06691, simple_loss=0.09515, pruned_loss=0.01426, audio_tagging_loss=0.005075, over 15604.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08843, pruned_loss=0.01221, audio_tagging_loss=0.008941, over 3040683.70 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:40:27,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3245026.6666666665, ans=0.1 2023-11-26 05:40:31,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3245026.6666666665, ans=0.2 2023-11-26 05:40:36,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3245093.3333333335, ans=0.125 2023-11-26 05:40:56,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3245160.0, ans=0.125 2023-11-26 05:41:14,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-26 05:41:15,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486800 2023-11-26 05:41:18,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3245293.3333333335, ans=0.2 2023-11-26 05:41:21,971 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5850, loss[loss=0.05547, simple_loss=0.07334, pruned_loss=0.007365, audio_tagging_loss=0.01144, over 15101.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08775, pruned_loss=0.01213, audio_tagging_loss=0.008989, over 3037984.35 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:41:33,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3245426.6666666665, ans=0.0 2023-11-26 05:41:35,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3245426.6666666665, ans=0.035 2023-11-26 05:41:55,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3245560.0, ans=0.0 2023-11-26 05:41:59,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3245560.0, ans=0.1 2023-11-26 05:42:06,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.540e+01 9.221e+01 1.014e+02 1.317e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 05:42:11,703 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486850 2023-11-26 05:42:17,537 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.03 vs. limit=10.0 2023-11-26 05:42:17,867 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5900, loss[loss=0.09078, simple_loss=0.1226, pruned_loss=0.01987, audio_tagging_loss=0.009594, over 15216.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08766, pruned_loss=0.01215, audio_tagging_loss=0.009003, over 3035847.82 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:43:06,748 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486900 2023-11-26 05:43:13,577 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5950, loss[loss=0.06478, simple_loss=0.0956, pruned_loss=0.01143, audio_tagging_loss=0.005552, over 14936.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08838, pruned_loss=0.01224, audio_tagging_loss=0.008905, over 3037776.18 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:43:36,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3246160.0, ans=0.0 2023-11-26 05:43:48,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246226.6666666665, ans=0.1 2023-11-26 05:43:54,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3246226.6666666665, ans=0.125 2023-11-26 05:43:57,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.522e+01 9.337e+01 1.020e+02 1.344e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 05:44:02,137 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486950 2023-11-26 05:44:08,294 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6000, loss[loss=0.05271, simple_loss=0.06869, pruned_loss=0.008521, audio_tagging_loss=0.009841, over 15331.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08926, pruned_loss=0.0124, audio_tagging_loss=0.008821, over 3037933.57 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:44:08,294 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 05:44:27,664 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5217, 3.3279, 3.8618, 3.6096], device='cuda:2') 2023-11-26 05:44:40,566 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05752, simple_loss=0.0506, pruned_loss=0.005164, audio_tagging_loss=0.02705, over 4681554.00 frames. 2023-11-26 05:44:40,567 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 05:45:02,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2023-11-26 05:45:08,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3246493.3333333335, ans=0.1 2023-11-26 05:45:14,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3246560.0, ans=0.2 2023-11-26 05:45:20,427 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:45:29,947 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487000 2023-11-26 05:45:30,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3246626.6666666665, ans=0.2 2023-11-26 05:45:30,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.28 vs. limit=10.0 2023-11-26 05:45:36,449 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6050, loss[loss=0.07361, simple_loss=0.09642, pruned_loss=0.01562, audio_tagging_loss=0.009779, over 14469.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08862, pruned_loss=0.01213, audio_tagging_loss=0.008793, over 3045891.13 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:45:44,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3246693.3333333335, ans=0.1 2023-11-26 05:45:50,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-11-26 05:45:51,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3246760.0, ans=0.2 2023-11-26 05:46:03,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3246826.6666666665, ans=0.2 2023-11-26 05:46:17,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3246893.3333333335, ans=0.2 2023-11-26 05:46:21,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.605e+01 9.174e+01 9.669e+01 1.333e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-26 05:46:21,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3246960.0, ans=0.125 2023-11-26 05:46:25,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487050 2023-11-26 05:46:29,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3246960.0, ans=6.0 2023-11-26 05:46:32,132 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6100, loss[loss=0.0682, simple_loss=0.08787, pruned_loss=0.01662, audio_tagging_loss=0.007644, over 15183.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08899, pruned_loss=0.01221, audio_tagging_loss=0.008807, over 3048032.98 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:46:48,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3247093.3333333335, ans=0.125 2023-11-26 05:46:49,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3247093.3333333335, ans=0.0 2023-11-26 05:47:06,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3247226.6666666665, ans=0.125 2023-11-26 05:47:08,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3247226.6666666665, ans=0.125 2023-11-26 05:47:18,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3247293.3333333335, ans=0.125 2023-11-26 05:47:20,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3247293.3333333335, ans=0.0 2023-11-26 05:47:21,769 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487100 2023-11-26 05:47:22,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-26 05:47:28,038 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6150, loss[loss=0.07892, simple_loss=0.116, pruned_loss=0.01369, audio_tagging_loss=0.007216, over 15050.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08944, pruned_loss=0.01234, audio_tagging_loss=0.008787, over 3053386.67 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:47:28,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=22.5 2023-11-26 05:47:45,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3247426.6666666665, ans=0.125 2023-11-26 05:47:48,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3247426.6666666665, ans=0.1 2023-11-26 05:47:49,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-26 05:47:50,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3247493.3333333335, ans=0.0 2023-11-26 05:47:58,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3247493.3333333335, ans=0.0 2023-11-26 05:48:09,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247560.0, ans=0.1 2023-11-26 05:48:12,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.728e+01 9.335e+01 1.012e+02 1.245e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 05:48:12,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3247626.6666666665, ans=0.0 2023-11-26 05:48:17,899 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487150 2023-11-26 05:48:24,207 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6200, loss[loss=0.04713, simple_loss=0.05572, pruned_loss=0.007055, audio_tagging_loss=0.01222, over 17270.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09018, pruned_loss=0.0125, audio_tagging_loss=0.008843, over 3057270.27 frames. ], batch size: 67, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:48:26,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-11-26 05:48:46,568 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-26 05:48:50,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-26 05:49:03,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3247893.3333333335, ans=0.0 2023-11-26 05:49:04,959 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:49:06,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3247893.3333333335, ans=0.1 2023-11-26 05:49:13,231 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487200 2023-11-26 05:49:14,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3247960.0, ans=0.0 2023-11-26 05:49:19,827 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6250, loss[loss=0.09712, simple_loss=0.1348, pruned_loss=0.02255, audio_tagging_loss=0.007159, over 16392.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09078, pruned_loss=0.01249, audio_tagging_loss=0.008867, over 3057015.79 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:49:44,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3248160.0, ans=10.0 2023-11-26 05:49:44,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3248160.0, ans=0.1 2023-11-26 05:49:47,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3248160.0, ans=10.0 2023-11-26 05:49:48,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248160.0, ans=0.1 2023-11-26 05:49:56,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3248226.6666666665, ans=0.125 2023-11-26 05:50:04,065 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.628e+01 9.158e+01 1.005e+02 1.454e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 05:50:08,406 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487250 2023-11-26 05:50:15,185 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6300, loss[loss=0.07083, simple_loss=0.09402, pruned_loss=0.01362, audio_tagging_loss=0.0102, over 14725.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09101, pruned_loss=0.01259, audio_tagging_loss=0.008963, over 3051435.71 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:50:47,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2023-11-26 05:51:04,444 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487300 2023-11-26 05:51:10,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3248626.6666666665, ans=0.125 2023-11-26 05:51:11,884 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6350, loss[loss=0.06621, simple_loss=0.09307, pruned_loss=0.01203, audio_tagging_loss=0.007642, over 14497.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08967, pruned_loss=0.01237, audio_tagging_loss=0.009122, over 3048009.25 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:51:13,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3248693.3333333335, ans=0.125 2023-11-26 05:51:30,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3248760.0, ans=0.05 2023-11-26 05:51:33,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2023-11-26 05:51:39,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3248826.6666666665, ans=0.2 2023-11-26 05:51:41,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248826.6666666665, ans=0.1 2023-11-26 05:51:42,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3248826.6666666665, ans=0.125 2023-11-26 05:51:54,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=3248893.3333333335, ans=22.5 2023-11-26 05:51:56,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.555e+01 9.166e+01 9.747e+01 1.455e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 05:52:00,763 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487350 2023-11-26 05:52:00,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3248960.0, ans=0.1 2023-11-26 05:52:06,939 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6400, loss[loss=0.06723, simple_loss=0.09324, pruned_loss=0.01307, audio_tagging_loss=0.00754, over 15602.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08957, pruned_loss=0.01226, audio_tagging_loss=0.009239, over 3045936.82 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:52:08,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3249026.6666666665, ans=0.125 2023-11-26 05:52:16,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-26 05:52:18,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3249093.3333333335, ans=0.125 2023-11-26 05:52:20,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-26 05:52:26,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3249093.3333333335, ans=0.125 2023-11-26 05:52:55,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487400 2023-11-26 05:53:02,863 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6450, loss[loss=0.08422, simple_loss=0.1094, pruned_loss=0.02158, audio_tagging_loss=0.007911, over 14926.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08885, pruned_loss=0.01221, audio_tagging_loss=0.009337, over 3045660.45 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:53:11,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3249360.0, ans=0.125 2023-11-26 05:53:32,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3249493.3333333335, ans=0.125 2023-11-26 05:53:47,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.690e+01 9.179e+01 1.001e+02 1.387e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 05:53:52,270 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487450 2023-11-26 05:53:56,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3249626.6666666665, ans=0.125 2023-11-26 05:53:59,112 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6500, loss[loss=0.07266, simple_loss=0.1021, pruned_loss=0.01424, audio_tagging_loss=0.007398, over 15892.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08881, pruned_loss=0.01219, audio_tagging_loss=0.009218, over 3044505.21 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:54:09,430 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:54:22,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3249826.6666666665, ans=0.1 2023-11-26 05:54:42,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-11-26 05:54:44,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3249960.0, ans=0.125 2023-11-26 05:54:48,356 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487500 2023-11-26 05:54:51,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3249960.0, ans=0.09899494936611666 2023-11-26 05:54:54,635 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6550, loss[loss=0.0642, simple_loss=0.09217, pruned_loss=0.01043, audio_tagging_loss=0.007681, over 15100.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08975, pruned_loss=0.01226, audio_tagging_loss=0.008954, over 3038577.24 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:55:39,379 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.523e+01 8.995e+01 9.830e+01 1.214e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-26 05:55:43,709 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487550 2023-11-26 05:55:46,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-26 05:55:50,078 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6600, loss[loss=0.06902, simple_loss=0.09681, pruned_loss=0.01235, audio_tagging_loss=0.008273, over 14473.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08997, pruned_loss=0.0123, audio_tagging_loss=0.008833, over 3042115.73 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:56:02,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3250426.6666666665, ans=0.125 2023-11-26 05:56:03,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-26 05:56:11,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2023-11-26 05:56:22,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3250493.3333333335, ans=0.0 2023-11-26 05:56:27,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3250560.0, ans=0.035 2023-11-26 05:56:40,087 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487600 2023-11-26 05:56:47,245 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6650, loss[loss=0.06875, simple_loss=0.09322, pruned_loss=0.0143, audio_tagging_loss=0.00784, over 15333.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09035, pruned_loss=0.01237, audio_tagging_loss=0.008842, over 3046158.84 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:56:52,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3250693.3333333335, ans=0.1 2023-11-26 05:56:58,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3250760.0, ans=0.05 2023-11-26 05:57:07,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3250760.0, ans=0.1 2023-11-26 05:57:11,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3250826.6666666665, ans=0.0 2023-11-26 05:57:32,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.660e+01 9.061e+01 9.694e+01 1.150e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-26 05:57:36,390 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487650 2023-11-26 05:57:40,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3250960.0, ans=0.125 2023-11-26 05:57:42,722 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6700, loss[loss=0.05385, simple_loss=0.07587, pruned_loss=0.008934, audio_tagging_loss=0.006981, over 15887.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09083, pruned_loss=0.01261, audio_tagging_loss=0.008674, over 3047865.27 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:57:53,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3251093.3333333335, ans=0.1 2023-11-26 05:58:08,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3251160.0, ans=0.125 2023-11-26 05:58:11,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3251160.0, ans=0.1 2023-11-26 05:58:13,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3251160.0, ans=0.125 2023-11-26 05:58:21,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3251226.6666666665, ans=0.0 2023-11-26 05:58:32,033 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487700 2023-11-26 05:58:32,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3251293.3333333335, ans=0.05 2023-11-26 05:58:37,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3251360.0, ans=0.5 2023-11-26 05:58:38,298 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6750, loss[loss=0.06052, simple_loss=0.08288, pruned_loss=0.01143, audio_tagging_loss=0.007642, over 14124.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08993, pruned_loss=0.01264, audio_tagging_loss=0.008642, over 3038375.36 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:58:43,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3251360.0, ans=0.0 2023-11-26 05:58:59,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3251426.6666666665, ans=0.125 2023-11-26 05:59:24,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.663e+01 9.356e+01 1.018e+02 1.599e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 05:59:27,691 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487750 2023-11-26 05:59:34,841 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6800, loss[loss=0.05915, simple_loss=0.07749, pruned_loss=0.01218, audio_tagging_loss=0.008225, over 14553.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09029, pruned_loss=0.0128, audio_tagging_loss=0.00852, over 3037816.07 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:59:42,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3251693.3333333335, ans=0.0 2023-11-26 06:00:01,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-11-26 06:00:21,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3251960.0, ans=0.0 2023-11-26 06:00:24,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487800 2023-11-26 06:00:31,086 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6850, loss[loss=0.06297, simple_loss=0.08606, pruned_loss=0.01273, audio_tagging_loss=0.007215, over 14573.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09005, pruned_loss=0.01254, audio_tagging_loss=0.008528, over 3042615.58 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:00:49,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=12.0 2023-11-26 06:00:49,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=8.0 2023-11-26 06:00:59,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3252160.0, ans=0.125 2023-11-26 06:00:59,494 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:01:11,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3252226.6666666665, ans=0.125 2023-11-26 06:01:16,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.578e+01 9.183e+01 9.945e+01 1.364e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 06:01:19,750 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487850 2023-11-26 06:01:26,608 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6900, loss[loss=0.05903, simple_loss=0.07956, pruned_loss=0.006769, audio_tagging_loss=0.01248, over 15704.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09102, pruned_loss=0.01269, audio_tagging_loss=0.008556, over 3038153.23 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:01:39,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-11-26 06:01:43,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-26 06:01:54,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3252493.3333333335, ans=0.0 2023-11-26 06:02:02,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3252560.0, ans=0.125 2023-11-26 06:02:05,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3252560.0, ans=0.0 2023-11-26 06:02:09,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3252560.0, ans=0.0 2023-11-26 06:02:10,257 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:02:11,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3252626.6666666665, ans=0.125 2023-11-26 06:02:15,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3252626.6666666665, ans=0.125 2023-11-26 06:02:16,115 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487900 2023-11-26 06:02:17,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-26 06:02:22,933 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6950, loss[loss=0.08123, simple_loss=0.111, pruned_loss=0.01807, audio_tagging_loss=0.007688, over 14841.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09009, pruned_loss=0.01271, audio_tagging_loss=0.008657, over 3039248.54 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:02:37,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3252760.0, ans=0.0 2023-11-26 06:02:38,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-11-26 06:02:43,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3252826.6666666665, ans=0.125 2023-11-26 06:02:52,323 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=15.0 2023-11-26 06:02:54,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3252893.3333333335, ans=0.125 2023-11-26 06:03:02,299 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2023-11-26 06:03:10,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.822e+01 9.326e+01 1.010e+02 2.073e+02, threshold=1.865e+02, percent-clipped=1.0 2023-11-26 06:03:12,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487950 2023-11-26 06:03:15,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2023-11-26 06:03:17,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3253026.6666666665, ans=0.025 2023-11-26 06:03:18,663 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7000, loss[loss=0.07205, simple_loss=0.1053, pruned_loss=0.01225, audio_tagging_loss=0.00716, over 15516.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09023, pruned_loss=0.01268, audio_tagging_loss=0.008715, over 3037902.77 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:03:29,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3253093.3333333335, ans=0.0 2023-11-26 06:03:33,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3253093.3333333335, ans=0.0 2023-11-26 06:03:49,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3253160.0, ans=0.125 2023-11-26 06:04:07,802 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488000 2023-11-26 06:04:13,940 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-26 06:04:16,285 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7050, loss[loss=0.06292, simple_loss=0.08681, pruned_loss=0.006417, audio_tagging_loss=0.01309, over 14329.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0905, pruned_loss=0.01282, audio_tagging_loss=0.008768, over 3040099.36 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:04:19,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.05 vs. limit=10.0 2023-11-26 06:04:23,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-11-26 06:04:24,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3253360.0, ans=0.035 2023-11-26 06:04:25,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3253360.0, ans=0.1 2023-11-26 06:04:32,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3253426.6666666665, ans=0.125 2023-11-26 06:04:38,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-11-26 06:05:02,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.684e+01 9.399e+01 1.022e+02 1.192e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 06:05:02,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3253626.6666666665, ans=0.125 2023-11-26 06:05:05,301 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488050 2023-11-26 06:05:10,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3253626.6666666665, ans=0.1 2023-11-26 06:05:12,704 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7100, loss[loss=0.06711, simple_loss=0.0894, pruned_loss=0.01359, audio_tagging_loss=0.008817, over 14612.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09106, pruned_loss=0.01283, audio_tagging_loss=0.008885, over 3037043.42 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:05:19,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3253693.3333333335, ans=0.07 2023-11-26 06:05:30,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3253760.0, ans=0.2 2023-11-26 06:05:33,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3253826.6666666665, ans=0.5 2023-11-26 06:05:35,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-26 06:05:38,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3253826.6666666665, ans=0.05 2023-11-26 06:05:46,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3253893.3333333335, ans=0.125 2023-11-26 06:05:57,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2023-11-26 06:06:02,065 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488100 2023-11-26 06:06:08,413 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7150, loss[loss=0.07484, simple_loss=0.09809, pruned_loss=0.01617, audio_tagging_loss=0.00962, over 16682.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09076, pruned_loss=0.01281, audio_tagging_loss=0.008907, over 3043286.83 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:06:09,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3254026.6666666665, ans=0.2 2023-11-26 06:06:16,313 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.53 vs. limit=22.5 2023-11-26 06:06:26,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3254093.3333333335, ans=0.0 2023-11-26 06:06:28,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3254160.0, ans=0.125 2023-11-26 06:06:46,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3254226.6666666665, ans=0.2 2023-11-26 06:06:48,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3254226.6666666665, ans=0.1 2023-11-26 06:06:51,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3254293.3333333335, ans=0.125 2023-11-26 06:06:54,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.934e+01 9.396e+01 1.011e+02 1.220e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 06:06:56,979 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488150 2023-11-26 06:07:00,278 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:07:03,190 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7200, loss[loss=0.07015, simple_loss=0.09611, pruned_loss=0.01328, audio_tagging_loss=0.008817, over 16207.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09057, pruned_loss=0.01292, audio_tagging_loss=0.00904, over 3048893.39 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:07:04,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3254360.0, ans=0.125 2023-11-26 06:07:05,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3254360.0, ans=0.125 2023-11-26 06:07:28,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=15.0 2023-11-26 06:07:36,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=8.0 2023-11-26 06:07:42,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2023-11-26 06:07:46,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3254626.6666666665, ans=0.125 2023-11-26 06:07:50,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3254626.6666666665, ans=0.0 2023-11-26 06:07:52,226 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488200 2023-11-26 06:07:53,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.25 vs. limit=22.5 2023-11-26 06:07:59,922 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7250, loss[loss=0.07786, simple_loss=0.1052, pruned_loss=0.01377, audio_tagging_loss=0.01149, over 14464.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09076, pruned_loss=0.01275, audio_tagging_loss=0.009052, over 3047416.60 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:08:02,728 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:08:11,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3254760.0, ans=0.0 2023-11-26 06:08:18,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3254760.0, ans=0.125 2023-11-26 06:08:18,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-26 06:08:33,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254893.3333333335, ans=0.1 2023-11-26 06:08:47,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.575e+01 9.064e+01 9.788e+01 1.213e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-26 06:08:49,622 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488250 2023-11-26 06:08:54,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3254960.0, ans=0.0 2023-11-26 06:08:54,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3254960.0, ans=0.2 2023-11-26 06:08:56,395 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7300, loss[loss=0.06859, simple_loss=0.08671, pruned_loss=0.01283, audio_tagging_loss=0.0124, over 14910.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09111, pruned_loss=0.01284, audio_tagging_loss=0.009017, over 3050284.28 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:08:56,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3255026.6666666665, ans=0.1 2023-11-26 06:08:58,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3255026.6666666665, ans=0.2 2023-11-26 06:09:00,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3255026.6666666665, ans=0.2 2023-11-26 06:09:18,129 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2023-11-26 06:09:38,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=22.5 2023-11-26 06:09:41,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-11-26 06:09:42,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2023-11-26 06:09:45,381 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488300 2023-11-26 06:09:51,714 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7350, loss[loss=0.0551, simple_loss=0.07494, pruned_loss=0.01065, audio_tagging_loss=0.006985, over 15062.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09162, pruned_loss=0.01275, audio_tagging_loss=0.008803, over 3045556.18 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:09:58,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3255360.0, ans=0.125 2023-11-26 06:09:59,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3255360.0, ans=0.2 2023-11-26 06:10:21,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3255493.3333333335, ans=0.125 2023-11-26 06:10:39,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.543e+01 9.108e+01 9.776e+01 1.189e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 06:10:40,728 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488350 2023-11-26 06:10:47,653 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7400, loss[loss=0.07008, simple_loss=0.09987, pruned_loss=0.01263, audio_tagging_loss=0.007512, over 15168.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09147, pruned_loss=0.01285, audio_tagging_loss=0.008679, over 3046274.76 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:10:47,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3255693.3333333335, ans=0.1 2023-11-26 06:10:53,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-26 06:10:57,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-26 06:11:17,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3255826.6666666665, ans=0.0 2023-11-26 06:11:17,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3255826.6666666665, ans=0.125 2023-11-26 06:11:37,321 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488400 2023-11-26 06:11:44,995 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7450, loss[loss=0.08233, simple_loss=0.1166, pruned_loss=0.01851, audio_tagging_loss=0.005512, over 15820.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09103, pruned_loss=0.01283, audio_tagging_loss=0.008604, over 3051752.63 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:11:49,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3256026.6666666665, ans=0.0 2023-11-26 06:12:10,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3256160.0, ans=0.0 2023-11-26 06:12:32,355 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.793e+01 9.296e+01 1.001e+02 1.337e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 06:12:33,491 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488450 2023-11-26 06:12:38,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=15.0 2023-11-26 06:12:39,869 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7500, loss[loss=0.06043, simple_loss=0.08125, pruned_loss=0.01196, audio_tagging_loss=0.007843, over 15593.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09091, pruned_loss=0.0127, audio_tagging_loss=0.008543, over 3051469.67 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:13:29,013 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488500 2023-11-26 06:13:35,297 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7550, loss[loss=0.08055, simple_loss=0.1122, pruned_loss=0.01619, audio_tagging_loss=0.008239, over 15032.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08947, pruned_loss=0.01246, audio_tagging_loss=0.008638, over 3038006.18 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:13:40,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3256693.3333333335, ans=0.125 2023-11-26 06:14:02,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3256826.6666666665, ans=0.07 2023-11-26 06:14:04,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3256826.6666666665, ans=0.125 2023-11-26 06:14:06,543 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.93 vs. limit=10.0 2023-11-26 06:14:23,930 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 9.000e+01 9.495e+01 1.038e+02 1.345e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 06:14:24,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-11-26 06:14:25,071 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488550 2023-11-26 06:14:31,444 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7600, loss[loss=0.07235, simple_loss=0.1016, pruned_loss=0.01362, audio_tagging_loss=0.007941, over 13403.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08954, pruned_loss=0.0125, audio_tagging_loss=0.008732, over 3035271.22 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:14:36,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2023-11-26 06:15:14,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3257226.6666666665, ans=0.025 2023-11-26 06:15:19,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3257293.3333333335, ans=0.125 2023-11-26 06:15:20,847 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488600 2023-11-26 06:15:27,878 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7650, loss[loss=0.04072, simple_loss=0.04271, pruned_loss=0.006786, audio_tagging_loss=0.01258, over 16521.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08911, pruned_loss=0.01249, audio_tagging_loss=0.008661, over 3034128.12 frames. ], batch size: 65, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:15:28,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3257360.0, ans=0.125 2023-11-26 06:15:30,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-26 06:15:36,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3257360.0, ans=0.0 2023-11-26 06:15:52,447 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:16:00,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3257560.0, ans=0.1 2023-11-26 06:16:00,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-26 06:16:12,081 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-26 06:16:16,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.718e+01 9.418e+01 1.004e+02 2.180e+02, threshold=1.884e+02, percent-clipped=1.0 2023-11-26 06:16:16,771 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488650 2023-11-26 06:16:16,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3257626.6666666665, ans=0.1 2023-11-26 06:16:23,072 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7700, loss[loss=0.06555, simple_loss=0.08761, pruned_loss=0.01244, audio_tagging_loss=0.009305, over 14412.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08948, pruned_loss=0.01248, audio_tagging_loss=0.008693, over 3038363.61 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:16:41,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3257760.0, ans=0.1 2023-11-26 06:16:49,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3257826.6666666665, ans=0.0 2023-11-26 06:17:12,446 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488700 2023-11-26 06:17:13,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3257960.0, ans=0.2 2023-11-26 06:17:19,365 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7750, loss[loss=0.06807, simple_loss=0.0878, pruned_loss=0.01417, audio_tagging_loss=0.01, over 14667.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08936, pruned_loss=0.01243, audio_tagging_loss=0.008804, over 3034677.84 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:17:21,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-26 06:17:28,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3258026.6666666665, ans=0.125 2023-11-26 06:17:45,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3258160.0, ans=0.0 2023-11-26 06:18:01,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3258226.6666666665, ans=0.09899494936611666 2023-11-26 06:18:08,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.599e+01 9.200e+01 9.734e+01 1.299e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 06:18:08,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488750 2023-11-26 06:18:15,095 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7800, loss[loss=0.05192, simple_loss=0.07115, pruned_loss=0.008243, audio_tagging_loss=0.008099, over 15459.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08991, pruned_loss=0.0125, audio_tagging_loss=0.008846, over 3030670.84 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:18:31,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3258426.6666666665, ans=0.0 2023-11-26 06:18:31,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3258426.6666666665, ans=0.125 2023-11-26 06:18:34,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3258426.6666666665, ans=0.1 2023-11-26 06:18:40,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-26 06:18:40,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2023-11-26 06:18:54,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3258560.0, ans=0.125 2023-11-26 06:18:56,272 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-26 06:19:04,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488800 2023-11-26 06:19:11,388 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7850, loss[loss=0.06228, simple_loss=0.08272, pruned_loss=0.0114, audio_tagging_loss=0.00952, over 15580.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.0904, pruned_loss=0.01237, audio_tagging_loss=0.008935, over 3036945.44 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:19:12,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3258693.3333333335, ans=0.125 2023-11-26 06:19:16,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3258693.3333333335, ans=0.0 2023-11-26 06:19:23,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3258760.0, ans=0.0 2023-11-26 06:19:23,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3258760.0, ans=0.2 2023-11-26 06:19:25,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.88 vs. limit=10.0 2023-11-26 06:19:28,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3258760.0, ans=0.125 2023-11-26 06:19:35,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3258826.6666666665, ans=0.0 2023-11-26 06:19:38,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3258826.6666666665, ans=0.0 2023-11-26 06:19:43,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3258893.3333333335, ans=0.2 2023-11-26 06:19:57,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3258960.0, ans=0.05 2023-11-26 06:20:00,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.695e+01 9.194e+01 9.770e+01 1.489e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 06:20:00,850 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488850 2023-11-26 06:20:00,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3258960.0, ans=0.1 2023-11-26 06:20:06,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3259026.6666666665, ans=0.07 2023-11-26 06:20:07,638 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7900, loss[loss=0.05471, simple_loss=0.07749, pruned_loss=0.005514, audio_tagging_loss=0.01045, over 15051.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0907, pruned_loss=0.0124, audio_tagging_loss=0.008843, over 3050054.09 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:20:08,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3259026.6666666665, ans=0.0 2023-11-26 06:20:21,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2023-11-26 06:20:51,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3259293.3333333335, ans=0.1 2023-11-26 06:20:57,382 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488900 2023-11-26 06:21:03,737 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7950, loss[loss=0.08005, simple_loss=0.1069, pruned_loss=0.01381, audio_tagging_loss=0.01278, over 15930.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09061, pruned_loss=0.01247, audio_tagging_loss=0.008932, over 3050998.27 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:21:07,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3259360.0, ans=0.0 2023-11-26 06:21:16,901 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:21:26,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3259493.3333333335, ans=0.125 2023-11-26 06:21:37,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3259560.0, ans=0.5 2023-11-26 06:21:42,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3259560.0, ans=22.5 2023-11-26 06:21:44,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3259560.0, ans=0.0 2023-11-26 06:21:52,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.753e+01 9.407e+01 1.023e+02 1.871e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-26 06:21:52,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488950 2023-11-26 06:21:59,113 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8000, loss[loss=0.05415, simple_loss=0.06919, pruned_loss=0.009003, audio_tagging_loss=0.01055, over 15601.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.0906, pruned_loss=0.01257, audio_tagging_loss=0.009014, over 3046559.24 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:22:23,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3259826.6666666665, ans=0.05 2023-11-26 06:22:28,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3259826.6666666665, ans=0.0 2023-11-26 06:22:36,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-26 06:22:37,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3259893.3333333335, ans=0.1 2023-11-26 06:22:48,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489000 2023-11-26 06:22:51,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-26 06:22:55,044 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8050, loss[loss=0.08671, simple_loss=0.1149, pruned_loss=0.0215, audio_tagging_loss=0.007773, over 14788.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.0899, pruned_loss=0.01253, audio_tagging_loss=0.009141, over 3046853.11 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:22:56,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3260026.6666666665, ans=0.0 2023-11-26 06:22:56,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3260026.6666666665, ans=0.04949747468305833 2023-11-26 06:22:59,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3260026.6666666665, ans=0.125 2023-11-26 06:23:05,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2023-11-26 06:23:11,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3260093.3333333335, ans=0.0 2023-11-26 06:23:12,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3260093.3333333335, ans=0.125 2023-11-26 06:23:15,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3260093.3333333335, ans=0.0 2023-11-26 06:23:43,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2023-11-26 06:23:44,635 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489050 2023-11-26 06:23:45,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3260293.3333333335, ans=0.1 2023-11-26 06:23:45,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-26 06:23:46,128 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.810e+01 9.339e+01 9.946e+01 1.266e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 06:23:48,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-26 06:23:51,452 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8100, loss[loss=0.05187, simple_loss=0.07171, pruned_loss=0.008686, audio_tagging_loss=0.007335, over 15410.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08981, pruned_loss=0.01243, audio_tagging_loss=0.009042, over 3045885.73 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:23:52,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3260360.0, ans=0.0 2023-11-26 06:23:58,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3260360.0, ans=0.09899494936611666 2023-11-26 06:24:09,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-26 06:24:36,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3260626.6666666665, ans=0.125 2023-11-26 06:24:40,655 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489100 2023-11-26 06:24:44,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3260626.6666666665, ans=0.07 2023-11-26 06:24:46,946 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8150, loss[loss=0.07814, simple_loss=0.1054, pruned_loss=0.01487, audio_tagging_loss=0.01055, over 15508.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09018, pruned_loss=0.01258, audio_tagging_loss=0.008921, over 3053173.95 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:24:52,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3260693.3333333335, ans=0.125 2023-11-26 06:25:05,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3260760.0, ans=0.125 2023-11-26 06:25:12,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.94 vs. limit=10.0 2023-11-26 06:25:14,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3260826.6666666665, ans=0.07 2023-11-26 06:25:22,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-26 06:25:26,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3260893.3333333335, ans=0.125 2023-11-26 06:25:29,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3260893.3333333335, ans=0.05 2023-11-26 06:25:35,900 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489150 2023-11-26 06:25:37,969 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.636e+01 9.236e+01 1.005e+02 1.829e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 06:25:38,566 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2023-11-26 06:25:43,410 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8200, loss[loss=0.0667, simple_loss=0.09265, pruned_loss=0.01301, audio_tagging_loss=0.007364, over 15435.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08934, pruned_loss=0.01231, audio_tagging_loss=0.008842, over 3049596.47 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 06:25:44,462 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:26:00,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3261093.3333333335, ans=0.05 2023-11-26 06:26:00,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261093.3333333335, ans=0.1 2023-11-26 06:26:19,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3261226.6666666665, ans=0.1 2023-11-26 06:26:33,277 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489200 2023-11-26 06:26:35,789 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:26:40,311 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8250, loss[loss=0.06148, simple_loss=0.09105, pruned_loss=0.007117, audio_tagging_loss=0.008836, over 16071.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09015, pruned_loss=0.01236, audio_tagging_loss=0.008734, over 3038775.65 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:26:41,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3261360.0, ans=0.0 2023-11-26 06:26:53,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3261426.6666666665, ans=0.125 2023-11-26 06:27:29,806 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489250 2023-11-26 06:27:31,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 8.764e+01 9.523e+01 1.021e+02 1.378e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 06:27:36,100 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8300, loss[loss=0.06928, simple_loss=0.09475, pruned_loss=0.01416, audio_tagging_loss=0.007749, over 15989.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09042, pruned_loss=0.01228, audio_tagging_loss=0.008697, over 3046069.88 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:27:41,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3261693.3333333335, ans=0.0 2023-11-26 06:27:54,301 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:28:09,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3261893.3333333335, ans=0.125 2023-11-26 06:28:13,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3261893.3333333335, ans=0.125 2023-11-26 06:28:25,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489300 2023-11-26 06:28:32,185 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8350, loss[loss=0.092, simple_loss=0.1239, pruned_loss=0.02346, audio_tagging_loss=0.006595, over 14773.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08946, pruned_loss=0.0122, audio_tagging_loss=0.008666, over 3048256.89 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:29:20,692 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2023-11-26 06:29:21,884 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489350 2023-11-26 06:29:23,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.707e+01 9.107e+01 9.856e+01 1.432e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-26 06:29:28,786 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8400, loss[loss=0.06514, simple_loss=0.0985, pruned_loss=0.008295, audio_tagging_loss=0.007599, over 15160.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08891, pruned_loss=0.01221, audio_tagging_loss=0.008657, over 3045604.79 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:29:39,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3262426.6666666665, ans=0.05 2023-11-26 06:29:42,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3262426.6666666665, ans=0.125 2023-11-26 06:29:48,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3262426.6666666665, ans=0.125 2023-11-26 06:29:57,581 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-26 06:30:16,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-26 06:30:17,889 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489400 2023-11-26 06:30:24,462 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8450, loss[loss=0.08615, simple_loss=0.1349, pruned_loss=0.01401, audio_tagging_loss=0.004676, over 16467.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.0892, pruned_loss=0.01228, audio_tagging_loss=0.008656, over 3044575.65 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:30:26,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3262693.3333333335, ans=0.0 2023-11-26 06:30:36,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-26 06:30:37,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3262760.0, ans=0.125 2023-11-26 06:30:45,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3262826.6666666665, ans=0.07 2023-11-26 06:31:06,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3262893.3333333335, ans=0.125 2023-11-26 06:31:13,402 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489450 2023-11-26 06:31:15,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.909e+01 9.451e+01 1.011e+02 1.331e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 06:31:15,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3262960.0, ans=0.125 2023-11-26 06:31:18,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-26 06:31:20,190 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8500, loss[loss=0.06715, simple_loss=0.09312, pruned_loss=0.01121, audio_tagging_loss=0.009379, over 15878.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08909, pruned_loss=0.0123, audio_tagging_loss=0.008716, over 3046764.91 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:31:24,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3263026.6666666665, ans=0.125 2023-11-26 06:31:53,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3263226.6666666665, ans=0.125 2023-11-26 06:32:09,401 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489500 2023-11-26 06:32:11,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3263293.3333333335, ans=0.0 2023-11-26 06:32:16,188 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8550, loss[loss=0.07454, simple_loss=0.1025, pruned_loss=0.01293, audio_tagging_loss=0.01035, over 16117.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08947, pruned_loss=0.0123, audio_tagging_loss=0.00878, over 3044679.72 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:32:21,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3263360.0, ans=0.0 2023-11-26 06:32:29,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3263426.6666666665, ans=0.125 2023-11-26 06:32:44,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=12.0 2023-11-26 06:33:00,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3263626.6666666665, ans=0.0 2023-11-26 06:33:02,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=12.0 2023-11-26 06:33:05,653 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489550 2023-11-26 06:33:07,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.883e+01 9.307e+01 9.956e+01 1.247e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 06:33:11,981 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8600, loss[loss=0.06287, simple_loss=0.08298, pruned_loss=0.01203, audio_tagging_loss=0.009346, over 14395.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08967, pruned_loss=0.01238, audio_tagging_loss=0.008791, over 3047550.57 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:33:12,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3263693.3333333335, ans=0.125 2023-11-26 06:33:13,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3263693.3333333335, ans=0.1 2023-11-26 06:33:13,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3263693.3333333335, ans=0.1 2023-11-26 06:33:48,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3263893.3333333335, ans=0.07 2023-11-26 06:34:00,555 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489600 2023-11-26 06:34:07,034 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8650, loss[loss=0.05218, simple_loss=0.06475, pruned_loss=0.00848, audio_tagging_loss=0.01132, over 15975.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08958, pruned_loss=0.01237, audio_tagging_loss=0.008879, over 3045260.49 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:34:10,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3264026.6666666665, ans=0.125 2023-11-26 06:34:28,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3264093.3333333335, ans=0.1 2023-11-26 06:34:39,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2023-11-26 06:34:41,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3264226.6666666665, ans=0.125 2023-11-26 06:34:49,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3264226.6666666665, ans=0.0 2023-11-26 06:34:50,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3264293.3333333335, ans=0.1 2023-11-26 06:34:56,643 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489650 2023-11-26 06:34:58,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.575e+01 9.501e+01 1.015e+02 1.798e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 06:34:59,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-11-26 06:35:03,400 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8700, loss[loss=0.06406, simple_loss=0.09175, pruned_loss=0.01025, audio_tagging_loss=0.007935, over 14352.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.0909, pruned_loss=0.01255, audio_tagging_loss=0.008921, over 3046458.67 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:35:15,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3264426.6666666665, ans=0.0 2023-11-26 06:35:33,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3264493.3333333335, ans=0.125 2023-11-26 06:35:43,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3264560.0, ans=0.0 2023-11-26 06:35:52,871 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489700 2023-11-26 06:35:55,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3264626.6666666665, ans=0.125 2023-11-26 06:35:59,776 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8750, loss[loss=0.0552, simple_loss=0.07263, pruned_loss=0.0108, audio_tagging_loss=0.00808, over 14558.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.0903, pruned_loss=0.01243, audio_tagging_loss=0.008976, over 3043637.37 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:36:35,543 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:36:48,798 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489750 2023-11-26 06:36:50,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.719e+01 9.577e+01 1.009e+02 1.331e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 06:36:52,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3264960.0, ans=0.05 2023-11-26 06:36:55,065 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8800, loss[loss=0.04578, simple_loss=0.05823, pruned_loss=0.003572, audio_tagging_loss=0.01309, over 14379.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09108, pruned_loss=0.01254, audio_tagging_loss=0.009084, over 3048357.97 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:37:01,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3265026.6666666665, ans=0.2 2023-11-26 06:37:15,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3265093.3333333335, ans=0.125 2023-11-26 06:37:30,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3265226.6666666665, ans=0.0 2023-11-26 06:37:33,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3265226.6666666665, ans=0.125 2023-11-26 06:37:36,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3265226.6666666665, ans=0.125 2023-11-26 06:37:44,507 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489800 2023-11-26 06:37:49,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3265293.3333333335, ans=10.0 2023-11-26 06:37:51,520 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8850, loss[loss=0.08167, simple_loss=0.1112, pruned_loss=0.01605, audio_tagging_loss=0.009998, over 15964.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09037, pruned_loss=0.01253, audio_tagging_loss=0.009167, over 3047719.81 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:38:02,758 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:38:12,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3265493.3333333335, ans=0.1 2023-11-26 06:38:20,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3265493.3333333335, ans=0.125 2023-11-26 06:38:29,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3265560.0, ans=0.1 2023-11-26 06:38:40,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489850 2023-11-26 06:38:41,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3265626.6666666665, ans=0.125 2023-11-26 06:38:44,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.713e+01 9.492e+01 1.007e+02 1.202e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 06:38:47,355 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8900, loss[loss=0.06114, simple_loss=0.0866, pruned_loss=0.01265, audio_tagging_loss=0.005192, over 14748.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09064, pruned_loss=0.01262, audio_tagging_loss=0.008978, over 3044158.39 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:38:48,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3265693.3333333335, ans=0.1 2023-11-26 06:38:49,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3265693.3333333335, ans=0.05 2023-11-26 06:39:04,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-26 06:39:10,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-26 06:39:28,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3265893.3333333335, ans=0.04949747468305833 2023-11-26 06:39:36,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3265960.0, ans=0.125 2023-11-26 06:39:36,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489900 2023-11-26 06:39:43,102 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8950, loss[loss=0.08098, simple_loss=0.1141, pruned_loss=0.01575, audio_tagging_loss=0.008193, over 15421.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09043, pruned_loss=0.01269, audio_tagging_loss=0.008881, over 3047429.13 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:39:52,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3266026.6666666665, ans=0.0 2023-11-26 06:40:32,187 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489950 2023-11-26 06:40:35,291 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.881e+01 9.559e+01 9.968e+01 1.237e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 06:40:38,547 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9000, loss[loss=0.06737, simple_loss=0.08976, pruned_loss=0.01274, audio_tagging_loss=0.00975, over 14833.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09052, pruned_loss=0.01268, audio_tagging_loss=0.008747, over 3046501.53 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:40:38,548 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 06:41:04,350 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1326, 2.3963, 5.0222, 2.8984], device='cuda:2') 2023-11-26 06:41:10,852 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05835, simple_loss=0.05057, pruned_loss=0.005166, audio_tagging_loss=0.0279, over 4681554.00 frames. 2023-11-26 06:41:10,852 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 06:41:12,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3266360.0, ans=0.2 2023-11-26 06:41:12,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3266360.0, ans=0.0 2023-11-26 06:41:16,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3266360.0, ans=0.125 2023-11-26 06:41:35,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3266493.3333333335, ans=0.125 2023-11-26 06:41:59,924 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490000 2023-11-26 06:42:04,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3266626.6666666665, ans=0.2 2023-11-26 06:42:06,615 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9050, loss[loss=0.05594, simple_loss=0.0772, pruned_loss=0.008639, audio_tagging_loss=0.008703, over 15154.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09144, pruned_loss=0.01262, audio_tagging_loss=0.008533, over 3050290.41 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:42:12,967 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2023-11-26 06:42:27,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3266760.0, ans=0.125 2023-11-26 06:42:29,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3266826.6666666665, ans=0.125 2023-11-26 06:42:44,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3266893.3333333335, ans=0.125 2023-11-26 06:42:56,364 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490050 2023-11-26 06:42:59,334 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.756e+01 9.461e+01 1.032e+02 1.293e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 06:43:03,100 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9100, loss[loss=0.06746, simple_loss=0.08794, pruned_loss=0.01725, audio_tagging_loss=0.006238, over 14412.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09116, pruned_loss=0.01256, audio_tagging_loss=0.008514, over 3051353.08 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:43:29,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3267160.0, ans=0.04949747468305833 2023-11-26 06:43:52,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490100 2023-11-26 06:43:56,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3267293.3333333335, ans=0.1 2023-11-26 06:43:58,969 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9150, loss[loss=0.077, simple_loss=0.1079, pruned_loss=0.01493, audio_tagging_loss=0.008124, over 15288.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09033, pruned_loss=0.01244, audio_tagging_loss=0.008667, over 3050032.64 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:44:29,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3267493.3333333335, ans=0.125 2023-11-26 06:44:45,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3267626.6666666665, ans=0.125 2023-11-26 06:44:47,816 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490150 2023-11-26 06:44:50,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.875e+01 9.458e+01 1.013e+02 1.353e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 06:44:54,046 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9200, loss[loss=0.069, simple_loss=0.09503, pruned_loss=0.01258, audio_tagging_loss=0.008912, over 15419.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09021, pruned_loss=0.01239, audio_tagging_loss=0.008781, over 3048261.15 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:45:17,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3267826.6666666665, ans=0.125 2023-11-26 06:45:35,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-26 06:45:43,768 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490200 2023-11-26 06:45:49,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3268026.6666666665, ans=0.1 2023-11-26 06:45:51,047 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9250, loss[loss=0.06457, simple_loss=0.09682, pruned_loss=0.01042, audio_tagging_loss=0.005736, over 15551.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08946, pruned_loss=0.01235, audio_tagging_loss=0.008788, over 3052568.34 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:46:01,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-26 06:46:03,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3268093.3333333335, ans=0.125 2023-11-26 06:46:05,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2023-11-26 06:46:23,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-26 06:46:27,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3268226.6666666665, ans=0.035 2023-11-26 06:46:29,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3268226.6666666665, ans=0.0 2023-11-26 06:46:35,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3268293.3333333335, ans=0.125 2023-11-26 06:46:39,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490250 2023-11-26 06:46:43,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.603e+01 9.080e+01 9.924e+01 1.383e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 06:46:46,720 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9300, loss[loss=0.06082, simple_loss=0.0726, pruned_loss=0.01089, audio_tagging_loss=0.01363, over 14727.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08933, pruned_loss=0.01243, audio_tagging_loss=0.008774, over 3054378.11 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:46:51,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3268360.0, ans=0.0 2023-11-26 06:46:57,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3268426.6666666665, ans=0.0 2023-11-26 06:47:22,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3268560.0, ans=0.125 2023-11-26 06:47:35,439 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490300 2023-11-26 06:47:41,731 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9350, loss[loss=0.0496, simple_loss=0.07068, pruned_loss=0.006431, audio_tagging_loss=0.007835, over 15848.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08951, pruned_loss=0.01237, audio_tagging_loss=0.008829, over 3056260.99 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:47:46,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3268693.3333333335, ans=0.1 2023-11-26 06:47:52,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-11-26 06:47:54,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2023-11-26 06:48:06,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2023-11-26 06:48:18,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-11-26 06:48:19,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3268893.3333333335, ans=0.05 2023-11-26 06:48:26,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3268960.0, ans=0.2 2023-11-26 06:48:27,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2023-11-26 06:48:31,062 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490350 2023-11-26 06:48:34,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.924e+01 9.559e+01 1.022e+02 1.389e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 06:48:37,928 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9400, loss[loss=0.09368, simple_loss=0.1374, pruned_loss=0.01941, audio_tagging_loss=0.005569, over 15556.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09054, pruned_loss=0.01275, audio_tagging_loss=0.008816, over 3065044.94 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:48:41,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3269026.6666666665, ans=0.125 2023-11-26 06:48:54,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3269093.3333333335, ans=0.0 2023-11-26 06:48:54,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3269093.3333333335, ans=15.0 2023-11-26 06:48:57,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3269093.3333333335, ans=0.0 2023-11-26 06:49:04,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3269160.0, ans=0.125 2023-11-26 06:49:18,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3269226.6666666665, ans=0.1 2023-11-26 06:49:27,317 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490400 2023-11-26 06:49:32,334 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:49:34,412 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9450, loss[loss=0.0913, simple_loss=0.1174, pruned_loss=0.02298, audio_tagging_loss=0.00964, over 15210.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09104, pruned_loss=0.0128, audio_tagging_loss=0.008884, over 3068498.85 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:49:44,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3269426.6666666665, ans=0.1 2023-11-26 06:50:23,253 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490450 2023-11-26 06:50:26,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.849e+01 9.435e+01 1.031e+02 1.248e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 06:50:29,504 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9500, loss[loss=0.05655, simple_loss=0.08498, pruned_loss=0.005965, audio_tagging_loss=0.008091, over 15771.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09105, pruned_loss=0.01279, audio_tagging_loss=0.008947, over 3065153.73 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:50:37,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3269693.3333333335, ans=0.05 2023-11-26 06:50:59,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3269826.6666666665, ans=15.0 2023-11-26 06:51:08,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-11-26 06:51:18,654 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490500 2023-11-26 06:51:23,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3269960.0, ans=0.125 2023-11-26 06:51:25,482 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9550, loss[loss=0.04806, simple_loss=0.0532, pruned_loss=0.008378, audio_tagging_loss=0.01309, over 15732.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08979, pruned_loss=0.01258, audio_tagging_loss=0.009043, over 3058203.34 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:51:28,191 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-11-26 06:51:38,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3270093.3333333335, ans=0.125 2023-11-26 06:51:46,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3270093.3333333335, ans=0.035 2023-11-26 06:52:06,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3270226.6666666665, ans=0.2 2023-11-26 06:52:09,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3270293.3333333335, ans=0.125 2023-11-26 06:52:15,533 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490550 2023-11-26 06:52:18,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.931e+01 9.591e+01 1.034e+02 1.211e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 06:52:19,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3270293.3333333335, ans=0.1 2023-11-26 06:52:22,418 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9600, loss[loss=0.05491, simple_loss=0.07324, pruned_loss=0.009691, audio_tagging_loss=0.008596, over 14132.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08909, pruned_loss=0.01247, audio_tagging_loss=0.009092, over 3051158.84 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:52:59,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3270560.0, ans=0.125 2023-11-26 06:53:11,611 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490600 2023-11-26 06:53:11,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3270626.6666666665, ans=0.0 2023-11-26 06:53:12,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3270626.6666666665, ans=10.0 2023-11-26 06:53:18,187 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9650, loss[loss=0.06216, simple_loss=0.08501, pruned_loss=0.01323, audio_tagging_loss=0.006419, over 15908.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08872, pruned_loss=0.01255, audio_tagging_loss=0.008979, over 3044890.68 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:53:20,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3270693.3333333335, ans=0.125 2023-11-26 06:53:34,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3270760.0, ans=0.1 2023-11-26 06:53:35,344 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.40 vs. limit=10.0 2023-11-26 06:53:57,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3270893.3333333335, ans=0.0 2023-11-26 06:53:58,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3270893.3333333335, ans=0.0 2023-11-26 06:54:05,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=10.0 2023-11-26 06:54:07,190 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490650 2023-11-26 06:54:10,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.629e+01 9.120e+01 1.007e+02 1.405e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 06:54:13,990 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9700, loss[loss=0.05782, simple_loss=0.07407, pruned_loss=0.01306, audio_tagging_loss=0.007726, over 15464.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08864, pruned_loss=0.01242, audio_tagging_loss=0.008926, over 3039036.67 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:54:15,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271026.6666666665, ans=0.1 2023-11-26 06:54:42,520 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:54:43,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3271160.0, ans=0.125 2023-11-26 06:54:53,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3271226.6666666665, ans=0.125 2023-11-26 06:54:54,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3271226.6666666665, ans=0.0 2023-11-26 06:55:03,189 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490700 2023-11-26 06:55:07,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3271293.3333333335, ans=0.125 2023-11-26 06:55:10,690 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9750, loss[loss=0.0653, simple_loss=0.09059, pruned_loss=0.01054, audio_tagging_loss=0.00947, over 14568.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08975, pruned_loss=0.01248, audio_tagging_loss=0.008775, over 3047376.69 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:55:14,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3271360.0, ans=0.0 2023-11-26 06:55:14,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3271360.0, ans=0.125 2023-11-26 06:55:16,152 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2023-11-26 06:55:24,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3271426.6666666665, ans=0.125 2023-11-26 06:55:28,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3271426.6666666665, ans=0.0 2023-11-26 06:55:29,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271426.6666666665, ans=0.1 2023-11-26 06:55:33,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3271493.3333333335, ans=0.0 2023-11-26 06:55:39,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3271493.3333333335, ans=0.125 2023-11-26 06:55:59,908 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490750 2023-11-26 06:56:03,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.706e+01 9.282e+01 1.012e+02 1.180e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 06:56:06,095 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9800, loss[loss=0.04924, simple_loss=0.07103, pruned_loss=0.007703, audio_tagging_loss=0.006025, over 14855.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0901, pruned_loss=0.01245, audio_tagging_loss=0.008647, over 3046833.78 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:56:21,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3271760.0, ans=0.125 2023-11-26 06:56:30,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2023-11-26 06:56:31,590 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:56:49,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2023-11-26 06:56:55,068 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:56:55,095 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490800 2023-11-26 06:56:55,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3271960.0, ans=10.0 2023-11-26 06:57:01,824 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9850, loss[loss=0.06341, simple_loss=0.09069, pruned_loss=0.009091, audio_tagging_loss=0.008973, over 15375.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09084, pruned_loss=0.01261, audio_tagging_loss=0.008544, over 3045572.64 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:57:04,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3272026.6666666665, ans=0.05 2023-11-26 06:57:08,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3272026.6666666665, ans=0.125 2023-11-26 06:57:24,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3272160.0, ans=0.1 2023-11-26 06:57:36,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3272226.6666666665, ans=0.2 2023-11-26 06:57:44,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2023-11-26 06:57:44,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3272226.6666666665, ans=0.125 2023-11-26 06:57:51,703 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490850 2023-11-26 06:57:51,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3272293.3333333335, ans=0.0 2023-11-26 06:57:52,975 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:57:56,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.658e+01 9.556e+01 1.029e+02 1.537e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 06:57:58,694 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9900, loss[loss=0.05859, simple_loss=0.07958, pruned_loss=0.008892, audio_tagging_loss=0.009902, over 15241.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09143, pruned_loss=0.01275, audio_tagging_loss=0.008564, over 3048210.96 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:58:00,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3272360.0, ans=0.125 2023-11-26 06:58:05,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2023-11-26 06:58:23,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3272493.3333333335, ans=0.1 2023-11-26 06:58:45,499 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-11-26 06:58:48,410 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490900 2023-11-26 06:58:55,366 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9950, loss[loss=0.06437, simple_loss=0.08611, pruned_loss=0.01073, audio_tagging_loss=0.01058, over 15807.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09102, pruned_loss=0.01278, audio_tagging_loss=0.008555, over 3044538.68 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:58:56,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3272693.3333333335, ans=0.125 2023-11-26 06:59:09,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2023-11-26 06:59:27,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3272893.3333333335, ans=0.125 2023-11-26 06:59:28,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3272893.3333333335, ans=0.025 2023-11-26 06:59:39,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3272960.0, ans=0.125 2023-11-26 06:59:44,364 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490950 2023-11-26 06:59:48,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.546e+01 9.420e+01 1.008e+02 1.364e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 06:59:50,758 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10000, loss[loss=0.08641, simple_loss=0.1216, pruned_loss=0.01823, audio_tagging_loss=0.007369, over 13730.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09123, pruned_loss=0.0128, audio_tagging_loss=0.008551, over 3046899.90 frames. ], batch size: 52, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:00:12,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3273160.0, ans=0.0 2023-11-26 07:00:14,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3273160.0, ans=0.0 2023-11-26 07:00:25,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3273226.6666666665, ans=0.125 2023-11-26 07:00:32,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-11-26 07:00:40,154 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491000 2023-11-26 07:00:47,306 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10050, loss[loss=0.05701, simple_loss=0.07848, pruned_loss=0.008738, audio_tagging_loss=0.009035, over 15639.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09073, pruned_loss=0.01267, audio_tagging_loss=0.008491, over 3043438.16 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:01:14,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3273493.3333333335, ans=0.0 2023-11-26 07:01:29,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3273560.0, ans=0.125 2023-11-26 07:01:36,844 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491050 2023-11-26 07:01:41,061 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.461e+01 9.073e+01 9.880e+01 1.259e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-26 07:01:43,277 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10100, loss[loss=0.06597, simple_loss=0.08932, pruned_loss=0.01267, audio_tagging_loss=0.008641, over 15947.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09032, pruned_loss=0.01249, audio_tagging_loss=0.008611, over 3050094.48 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:01:49,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3273693.3333333335, ans=0.1 2023-11-26 07:01:51,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3273693.3333333335, ans=0.0 2023-11-26 07:02:20,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3273893.3333333335, ans=0.0 2023-11-26 07:02:21,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3273893.3333333335, ans=0.125 2023-11-26 07:02:28,944 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:02:32,760 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491100 2023-11-26 07:02:38,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3274026.6666666665, ans=0.125 2023-11-26 07:02:39,072 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10150, loss[loss=0.07335, simple_loss=0.1011, pruned_loss=0.01668, audio_tagging_loss=0.006117, over 15062.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09014, pruned_loss=0.01239, audio_tagging_loss=0.008746, over 3053350.43 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:02:42,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3274026.6666666665, ans=0.125 2023-11-26 07:02:50,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3274093.3333333335, ans=0.0 2023-11-26 07:02:59,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3274093.3333333335, ans=0.125 2023-11-26 07:03:04,564 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-26 07:03:06,044 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:03:10,411 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:03:18,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3274226.6666666665, ans=0.125 2023-11-26 07:03:25,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3274293.3333333335, ans=0.0 2023-11-26 07:03:25,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3274293.3333333335, ans=0.125 2023-11-26 07:03:28,343 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491150 2023-11-26 07:03:32,396 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.834e+01 9.375e+01 1.026e+02 1.327e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 07:03:34,515 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10200, loss[loss=0.08404, simple_loss=0.1106, pruned_loss=0.019, audio_tagging_loss=0.009738, over 15330.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08984, pruned_loss=0.01236, audio_tagging_loss=0.008867, over 3063016.04 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:03:40,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3274360.0, ans=10.0 2023-11-26 07:03:52,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3274426.6666666665, ans=0.125 2023-11-26 07:03:55,603 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:03:55,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3274493.3333333335, ans=0.2 2023-11-26 07:03:56,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3274493.3333333335, ans=0.125 2023-11-26 07:04:00,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3274493.3333333335, ans=0.5 2023-11-26 07:04:20,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.68 vs. limit=10.0 2023-11-26 07:04:23,772 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491200 2023-11-26 07:04:26,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274626.6666666665, ans=0.1 2023-11-26 07:04:30,701 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10250, loss[loss=0.08487, simple_loss=0.125, pruned_loss=0.01463, audio_tagging_loss=0.007752, over 15339.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.0912, pruned_loss=0.0126, audio_tagging_loss=0.008884, over 3059224.09 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:05:05,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3274893.3333333335, ans=0.05 2023-11-26 07:05:19,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491250 2023-11-26 07:05:22,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3274960.0, ans=0.0 2023-11-26 07:05:23,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.938e+01 9.745e+01 1.064e+02 1.415e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-26 07:05:25,860 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10300, loss[loss=0.05005, simple_loss=0.07128, pruned_loss=0.00699, audio_tagging_loss=0.007424, over 14439.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09052, pruned_loss=0.01264, audio_tagging_loss=0.00886, over 3060104.67 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:05:44,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3275093.3333333335, ans=0.125 2023-11-26 07:05:45,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3275093.3333333335, ans=0.125 2023-11-26 07:05:58,932 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:06:09,549 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:06:15,258 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491300 2023-11-26 07:06:22,357 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10350, loss[loss=0.07474, simple_loss=0.1094, pruned_loss=0.01412, audio_tagging_loss=0.005939, over 14918.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09116, pruned_loss=0.01274, audio_tagging_loss=0.008878, over 3056172.26 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:06:35,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-26 07:06:41,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3275426.6666666665, ans=0.125 2023-11-26 07:06:42,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3275426.6666666665, ans=0.125 2023-11-26 07:06:53,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3275493.3333333335, ans=0.125 2023-11-26 07:06:59,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3275560.0, ans=0.125 2023-11-26 07:07:11,730 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491350 2023-11-26 07:07:16,376 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.783e+01 9.372e+01 1.013e+02 2.774e+02, threshold=1.874e+02, percent-clipped=1.0 2023-11-26 07:07:18,537 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10400, loss[loss=0.07824, simple_loss=0.09108, pruned_loss=0.02069, audio_tagging_loss=0.01201, over 15177.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09067, pruned_loss=0.01273, audio_tagging_loss=0.009028, over 3051605.87 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:07:19,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3275693.3333333335, ans=0.07 2023-11-26 07:07:34,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.31 vs. limit=15.0 2023-11-26 07:07:38,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3275760.0, ans=0.125 2023-11-26 07:07:39,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3275826.6666666665, ans=0.125 2023-11-26 07:07:41,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=3275826.6666666665, ans=0.1 2023-11-26 07:07:47,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3275826.6666666665, ans=0.125 2023-11-26 07:08:01,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3275893.3333333335, ans=0.0 2023-11-26 07:08:07,526 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491400 2023-11-26 07:08:07,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3275960.0, ans=0.125 2023-11-26 07:08:14,163 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10450, loss[loss=0.03878, simple_loss=0.05323, pruned_loss=0.004074, audio_tagging_loss=0.008087, over 14582.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08986, pruned_loss=0.01261, audio_tagging_loss=0.008971, over 3047771.54 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:08:27,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3276093.3333333335, ans=0.2 2023-11-26 07:08:27,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2023-11-26 07:08:37,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3276160.0, ans=0.125 2023-11-26 07:08:39,363 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:08:40,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3276160.0, ans=0.125 2023-11-26 07:08:45,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3276160.0, ans=0.5 2023-11-26 07:08:52,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3276226.6666666665, ans=0.125 2023-11-26 07:09:03,210 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491450 2023-11-26 07:09:07,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.707e+01 9.260e+01 9.868e+01 1.345e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 07:09:10,569 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10500, loss[loss=0.06554, simple_loss=0.09031, pruned_loss=0.01156, audio_tagging_loss=0.008824, over 15174.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08899, pruned_loss=0.01242, audio_tagging_loss=0.008926, over 3045772.66 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:09:40,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3276493.3333333335, ans=0.2 2023-11-26 07:09:47,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3276560.0, ans=0.0 2023-11-26 07:09:56,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-26 07:09:59,893 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491500 2023-11-26 07:10:06,821 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10550, loss[loss=0.04848, simple_loss=0.06095, pruned_loss=0.008693, audio_tagging_loss=0.009315, over 15963.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08965, pruned_loss=0.0125, audio_tagging_loss=0.008806, over 3047844.31 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:10:10,122 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:10:26,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-26 07:10:46,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=12.0 2023-11-26 07:10:49,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3276893.3333333335, ans=0.125 2023-11-26 07:10:55,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491550 2023-11-26 07:11:00,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.562e+01 9.260e+01 9.916e+01 1.260e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 07:11:01,901 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10600, loss[loss=0.07903, simple_loss=0.1192, pruned_loss=0.01448, audio_tagging_loss=0.004964, over 15501.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09005, pruned_loss=0.01252, audio_tagging_loss=0.008733, over 3042508.51 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:11:08,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=22.5 2023-11-26 07:11:29,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3277160.0, ans=0.1 2023-11-26 07:11:41,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3277226.6666666665, ans=0.125 2023-11-26 07:11:50,590 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491600 2023-11-26 07:11:54,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3277293.3333333335, ans=0.125 2023-11-26 07:11:57,749 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10650, loss[loss=0.08641, simple_loss=0.1227, pruned_loss=0.01737, audio_tagging_loss=0.007681, over 15048.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09049, pruned_loss=0.01254, audio_tagging_loss=0.008582, over 3047869.19 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:12:09,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3277426.6666666665, ans=0.1 2023-11-26 07:12:19,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3277493.3333333335, ans=0.2 2023-11-26 07:12:46,784 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491650 2023-11-26 07:12:47,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3277626.6666666665, ans=0.0 2023-11-26 07:12:48,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-26 07:12:50,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3277626.6666666665, ans=0.1 2023-11-26 07:12:53,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.757e+01 9.487e+01 1.015e+02 1.210e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 07:12:53,571 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10700, loss[loss=0.06713, simple_loss=0.08333, pruned_loss=0.01339, audio_tagging_loss=0.01208, over 14517.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09053, pruned_loss=0.01258, audio_tagging_loss=0.008584, over 3044531.56 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 07:13:00,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3277693.3333333335, ans=0.0 2023-11-26 07:13:02,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3277693.3333333335, ans=0.125 2023-11-26 07:13:12,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3277760.0, ans=0.1 2023-11-26 07:13:13,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3277760.0, ans=0.1 2023-11-26 07:13:13,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3277760.0, ans=0.0 2023-11-26 07:13:21,097 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:13:21,122 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:13:38,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2023-11-26 07:13:42,682 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491700 2023-11-26 07:13:42,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3277960.0, ans=0.0 2023-11-26 07:13:48,923 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10750, loss[loss=0.06697, simple_loss=0.0975, pruned_loss=0.01044, audio_tagging_loss=0.007771, over 14741.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09032, pruned_loss=0.01249, audio_tagging_loss=0.008503, over 3047760.60 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 07:13:51,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3278026.6666666665, ans=0.125 2023-11-26 07:14:08,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3278093.3333333335, ans=0.0 2023-11-26 07:14:10,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-26 07:14:15,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3278160.0, ans=0.125 2023-11-26 07:14:32,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3278293.3333333335, ans=0.0 2023-11-26 07:14:37,778 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491750 2023-11-26 07:14:44,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.438e+01 9.296e+01 1.012e+02 1.543e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 07:14:44,199 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10800, loss[loss=0.06796, simple_loss=0.09427, pruned_loss=0.01239, audio_tagging_loss=0.008433, over 14924.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08958, pruned_loss=0.01232, audio_tagging_loss=0.008501, over 3049205.60 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:14:45,434 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:14:45,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3278360.0, ans=0.125 2023-11-26 07:14:45,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3278360.0, ans=0.125 2023-11-26 07:15:19,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-26 07:15:25,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2023-11-26 07:15:28,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-26 07:15:33,512 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491800 2023-11-26 07:15:41,225 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10850, loss[loss=0.07102, simple_loss=0.09632, pruned_loss=0.01174, audio_tagging_loss=0.01111, over 14638.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08981, pruned_loss=0.01235, audio_tagging_loss=0.008583, over 3051763.87 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:15:46,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3278693.3333333335, ans=0.125 2023-11-26 07:15:47,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3278693.3333333335, ans=0.125 2023-11-26 07:15:56,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-26 07:15:56,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3278760.0, ans=0.0 2023-11-26 07:16:04,402 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2023-11-26 07:16:05,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3278826.6666666665, ans=0.0 2023-11-26 07:16:14,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3278893.3333333335, ans=0.0 2023-11-26 07:16:18,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3278893.3333333335, ans=0.1 2023-11-26 07:16:30,360 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491850 2023-11-26 07:16:33,505 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:16:36,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.766e+01 9.451e+01 1.013e+02 1.235e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 07:16:36,694 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10900, loss[loss=0.05092, simple_loss=0.06396, pruned_loss=0.009688, audio_tagging_loss=0.009257, over 15067.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08899, pruned_loss=0.01232, audio_tagging_loss=0.00872, over 3050786.21 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:16:48,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3279093.3333333335, ans=0.125 2023-11-26 07:16:50,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3279093.3333333335, ans=0.0 2023-11-26 07:17:02,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3279160.0, ans=0.125 2023-11-26 07:17:02,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2023-11-26 07:17:04,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3279160.0, ans=0.1 2023-11-26 07:17:16,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3279226.6666666665, ans=0.125 2023-11-26 07:17:23,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3279293.3333333335, ans=0.0 2023-11-26 07:17:25,287 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491900 2023-11-26 07:17:31,490 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10950, loss[loss=0.06106, simple_loss=0.08284, pruned_loss=0.009118, audio_tagging_loss=0.01053, over 15755.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08957, pruned_loss=0.01227, audio_tagging_loss=0.008802, over 3054179.57 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:17:33,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.67 vs. limit=15.0 2023-11-26 07:17:35,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3279360.0, ans=0.0 2023-11-26 07:17:43,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3279426.6666666665, ans=0.04949747468305833 2023-11-26 07:17:47,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3279426.6666666665, ans=0.0 2023-11-26 07:18:00,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3279493.3333333335, ans=0.125 2023-11-26 07:18:03,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3279493.3333333335, ans=0.2 2023-11-26 07:18:04,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3279560.0, ans=0.125 2023-11-26 07:18:10,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3279560.0, ans=0.1 2023-11-26 07:18:20,727 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491950 2023-11-26 07:18:27,601 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.778e+01 9.414e+01 1.024e+02 1.293e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 07:18:27,631 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11000, loss[loss=0.05278, simple_loss=0.05598, pruned_loss=0.008473, audio_tagging_loss=0.01632, over 14130.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08991, pruned_loss=0.01233, audio_tagging_loss=0.008874, over 3050873.82 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:18:36,599 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:18:36,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3279693.3333333335, ans=0.125 2023-11-26 07:18:40,834 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2023-11-26 07:18:42,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3279760.0, ans=0.2 2023-11-26 07:18:42,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-26 07:18:46,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3279760.0, ans=0.5 2023-11-26 07:18:51,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3279826.6666666665, ans=0.1 2023-11-26 07:18:51,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3279826.6666666665, ans=0.07 2023-11-26 07:18:58,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3279826.6666666665, ans=0.125 2023-11-26 07:19:16,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492000 2023-11-26 07:19:25,737 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11050, loss[loss=0.07339, simple_loss=0.1108, pruned_loss=0.01011, audio_tagging_loss=0.007896, over 14753.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09006, pruned_loss=0.01243, audio_tagging_loss=0.008967, over 3048378.78 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:19:25,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3280026.6666666665, ans=0.025 2023-11-26 07:19:30,732 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=22.5 2023-11-26 07:19:38,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3280093.3333333335, ans=0.05 2023-11-26 07:20:14,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492050 2023-11-26 07:20:16,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3280293.3333333335, ans=0.125 2023-11-26 07:20:20,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.875e+01 9.418e+01 1.004e+02 1.333e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 07:20:20,722 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11100, loss[loss=0.0842, simple_loss=0.1146, pruned_loss=0.01783, audio_tagging_loss=0.009084, over 14796.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09001, pruned_loss=0.01242, audio_tagging_loss=0.00904, over 3052770.09 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:20:27,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3280360.0, ans=0.0 2023-11-26 07:20:30,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3280426.6666666665, ans=0.2 2023-11-26 07:20:33,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3280426.6666666665, ans=0.0 2023-11-26 07:20:38,081 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2023-11-26 07:20:39,269 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-26 07:20:46,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3280493.3333333335, ans=0.04949747468305833 2023-11-26 07:20:48,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3280493.3333333335, ans=0.2 2023-11-26 07:21:09,386 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492100 2023-11-26 07:21:13,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3280626.6666666665, ans=0.1 2023-11-26 07:21:15,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3280693.3333333335, ans=0.125 2023-11-26 07:21:16,282 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11150, loss[loss=0.06918, simple_loss=0.09429, pruned_loss=0.01108, audio_tagging_loss=0.01095, over 16084.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09037, pruned_loss=0.01248, audio_tagging_loss=0.00908, over 3054138.46 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:21:41,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3280826.6666666665, ans=0.1 2023-11-26 07:21:42,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3280826.6666666665, ans=0.0 2023-11-26 07:22:05,969 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492150 2023-11-26 07:22:06,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.63 vs. limit=10.0 2023-11-26 07:22:10,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3280960.0, ans=0.0 2023-11-26 07:22:12,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 8.937e+01 9.375e+01 1.012e+02 1.316e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 07:22:12,787 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11200, loss[loss=0.06264, simple_loss=0.08823, pruned_loss=0.009236, audio_tagging_loss=0.009287, over 14709.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.0904, pruned_loss=0.01258, audio_tagging_loss=0.009102, over 3050170.66 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:22:13,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3281026.6666666665, ans=0.5 2023-11-26 07:22:14,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3281026.6666666665, ans=0.125 2023-11-26 07:22:14,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-26 07:22:29,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3281093.3333333335, ans=0.0 2023-11-26 07:22:29,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3281093.3333333335, ans=0.0 2023-11-26 07:22:47,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3281226.6666666665, ans=0.5 2023-11-26 07:22:50,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3281226.6666666665, ans=0.2 2023-11-26 07:22:53,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3281226.6666666665, ans=0.125 2023-11-26 07:23:01,940 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492200 2023-11-26 07:23:08,503 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11250, loss[loss=0.07032, simple_loss=0.09392, pruned_loss=0.01302, audio_tagging_loss=0.01033, over 15840.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.0906, pruned_loss=0.01269, audio_tagging_loss=0.009087, over 3052988.13 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:23:08,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3281360.0, ans=0.0 2023-11-26 07:23:37,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3281493.3333333335, ans=0.0 2023-11-26 07:23:57,217 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492250 2023-11-26 07:23:58,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=22.5 2023-11-26 07:24:04,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.640e+01 9.467e+01 1.012e+02 1.426e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 07:24:04,056 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11300, loss[loss=0.07581, simple_loss=0.1073, pruned_loss=0.01235, audio_tagging_loss=0.009826, over 15063.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09053, pruned_loss=0.01254, audio_tagging_loss=0.008879, over 3044891.25 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:24:10,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2023-11-26 07:24:21,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3281760.0, ans=0.125 2023-11-26 07:24:21,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281760.0, ans=0.1 2023-11-26 07:24:26,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3281826.6666666665, ans=0.0 2023-11-26 07:24:36,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3281893.3333333335, ans=0.125 2023-11-26 07:24:43,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3281893.3333333335, ans=0.0 2023-11-26 07:24:53,314 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492300 2023-11-26 07:24:55,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3281960.0, ans=0.0 2023-11-26 07:24:59,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2023-11-26 07:25:00,214 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11350, loss[loss=0.08041, simple_loss=0.1236, pruned_loss=0.01404, audio_tagging_loss=0.004579, over 15896.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09051, pruned_loss=0.01255, audio_tagging_loss=0.00869, over 3053365.77 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:25:48,940 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492350 2023-11-26 07:25:49,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3282293.3333333335, ans=0.125 2023-11-26 07:25:49,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-26 07:25:55,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.982e+01 9.660e+01 1.025e+02 3.694e+02, threshold=1.932e+02, percent-clipped=1.0 2023-11-26 07:25:55,322 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11400, loss[loss=0.07795, simple_loss=0.1041, pruned_loss=0.0159, audio_tagging_loss=0.00999, over 15079.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09043, pruned_loss=0.01253, audio_tagging_loss=0.008503, over 3050796.22 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:25:59,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3282360.0, ans=0.125 2023-11-26 07:26:10,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3282426.6666666665, ans=0.0 2023-11-26 07:26:33,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3282560.0, ans=0.07 2023-11-26 07:26:34,896 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:26:44,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492400 2023-11-26 07:26:49,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3282626.6666666665, ans=0.2 2023-11-26 07:26:50,396 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2023-11-26 07:26:51,867 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11450, loss[loss=0.04878, simple_loss=0.06402, pruned_loss=0.005902, audio_tagging_loss=0.01087, over 15527.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.0897, pruned_loss=0.01238, audio_tagging_loss=0.008605, over 3041330.07 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:27:29,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3282893.3333333335, ans=0.0 2023-11-26 07:27:33,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3282893.3333333335, ans=0.0 2023-11-26 07:27:39,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3282960.0, ans=0.1 2023-11-26 07:27:40,492 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492450 2023-11-26 07:27:47,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.866e+01 9.675e+01 1.039e+02 1.240e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 07:27:47,960 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11500, loss[loss=0.06969, simple_loss=0.09174, pruned_loss=0.01487, audio_tagging_loss=0.008943, over 15233.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08877, pruned_loss=0.01239, audio_tagging_loss=0.008577, over 3042683.16 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:28:12,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3283160.0, ans=0.125 2023-11-26 07:28:13,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3283160.0, ans=0.1 2023-11-26 07:28:28,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3283226.6666666665, ans=0.1 2023-11-26 07:28:36,730 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492500 2023-11-26 07:28:42,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3283360.0, ans=0.2 2023-11-26 07:28:42,938 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11550, loss[loss=0.07863, simple_loss=0.1039, pruned_loss=0.01691, audio_tagging_loss=0.009787, over 14940.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08929, pruned_loss=0.01238, audio_tagging_loss=0.008599, over 3050881.42 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:28:50,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3283360.0, ans=0.125 2023-11-26 07:29:00,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-26 07:29:16,758 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:29:22,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3283560.0, ans=0.125 2023-11-26 07:29:27,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3283626.6666666665, ans=0.2 2023-11-26 07:29:31,612 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492550 2023-11-26 07:29:38,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.923e+01 9.634e+01 1.014e+02 1.304e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 07:29:38,941 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11600, loss[loss=0.07166, simple_loss=0.09224, pruned_loss=0.018, audio_tagging_loss=0.007538, over 15112.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08918, pruned_loss=0.01245, audio_tagging_loss=0.008784, over 3046462.93 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:30:01,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2023-11-26 07:30:04,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3283826.6666666665, ans=0.125 2023-11-26 07:30:04,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3283826.6666666665, ans=0.125 2023-11-26 07:30:08,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3283826.6666666665, ans=0.0 2023-11-26 07:30:12,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3283893.3333333335, ans=0.0 2023-11-26 07:30:22,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-26 07:30:27,711 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492600 2023-11-26 07:30:34,798 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11650, loss[loss=0.06647, simple_loss=0.09431, pruned_loss=0.01181, audio_tagging_loss=0.007501, over 15562.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08919, pruned_loss=0.0124, audio_tagging_loss=0.008772, over 3045904.05 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:30:38,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3284026.6666666665, ans=0.125 2023-11-26 07:30:46,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3284093.3333333335, ans=0.125 2023-11-26 07:31:06,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2023-11-26 07:31:15,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3284226.6666666665, ans=0.035 2023-11-26 07:31:23,689 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492650 2023-11-26 07:31:29,932 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.493e+01 9.108e+01 9.754e+01 1.305e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 07:31:29,959 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11700, loss[loss=0.06981, simple_loss=0.09933, pruned_loss=0.01415, audio_tagging_loss=0.005998, over 16736.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0892, pruned_loss=0.01235, audio_tagging_loss=0.008823, over 3048197.93 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:31:32,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3284360.0, ans=0.0 2023-11-26 07:31:33,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3284360.0, ans=0.125 2023-11-26 07:31:35,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3284360.0, ans=0.2 2023-11-26 07:32:02,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3284560.0, ans=0.125 2023-11-26 07:32:02,978 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:32:18,476 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492700 2023-11-26 07:32:24,687 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11750, loss[loss=0.05617, simple_loss=0.06924, pruned_loss=0.009552, audio_tagging_loss=0.01199, over 15839.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.089, pruned_loss=0.0124, audio_tagging_loss=0.00885, over 3045021.53 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:32:27,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3284693.3333333335, ans=0.125 2023-11-26 07:32:33,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-11-26 07:32:45,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3284760.0, ans=0.0 2023-11-26 07:33:14,313 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492750 2023-11-26 07:33:21,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.644e+01 9.343e+01 1.016e+02 1.345e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 07:33:21,112 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11800, loss[loss=0.08219, simple_loss=0.1091, pruned_loss=0.01908, audio_tagging_loss=0.008571, over 15416.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08921, pruned_loss=0.0124, audio_tagging_loss=0.008935, over 3043078.76 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:33:33,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-26 07:33:35,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3285093.3333333335, ans=0.0 2023-11-26 07:33:35,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3285093.3333333335, ans=0.0 2023-11-26 07:34:00,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3285226.6666666665, ans=0.125 2023-11-26 07:34:07,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2023-11-26 07:34:10,110 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492800 2023-11-26 07:34:12,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3285293.3333333335, ans=0.125 2023-11-26 07:34:16,630 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11850, loss[loss=0.06815, simple_loss=0.0956, pruned_loss=0.01216, audio_tagging_loss=0.008184, over 15139.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08843, pruned_loss=0.01223, audio_tagging_loss=0.009053, over 3034698.35 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:34:27,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-11-26 07:34:38,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3285493.3333333335, ans=0.125 2023-11-26 07:34:51,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3285560.0, ans=0.125 2023-11-26 07:34:55,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3285560.0, ans=15.0 2023-11-26 07:34:55,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3285560.0, ans=0.125 2023-11-26 07:35:03,624 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2023-11-26 07:35:05,289 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492850 2023-11-26 07:35:11,677 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11900, loss[loss=0.09308, simple_loss=0.137, pruned_loss=0.01691, audio_tagging_loss=0.007651, over 14823.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08944, pruned_loss=0.01237, audio_tagging_loss=0.009151, over 3043726.87 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:35:11,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3285693.3333333335, ans=0.0 2023-11-26 07:35:12,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.644e+01 9.176e+01 9.875e+01 1.365e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-26 07:35:19,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3285693.3333333335, ans=0.0 2023-11-26 07:35:31,928 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2023-11-26 07:35:44,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-26 07:36:00,454 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492900 2023-11-26 07:36:07,284 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11950, loss[loss=0.05529, simple_loss=0.07109, pruned_loss=0.009838, audio_tagging_loss=0.009909, over 17305.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08961, pruned_loss=0.01259, audio_tagging_loss=0.009123, over 3049765.83 frames. ], batch size: 66, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:36:07,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3286026.6666666665, ans=0.0 2023-11-26 07:36:20,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3286093.3333333335, ans=0.125 2023-11-26 07:36:26,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3286093.3333333335, ans=0.035 2023-11-26 07:36:54,464 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492950 2023-11-26 07:37:00,472 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 12000, loss[loss=0.07133, simple_loss=0.1059, pruned_loss=0.01085, audio_tagging_loss=0.007509, over 15101.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09033, pruned_loss=0.01254, audio_tagging_loss=0.009188, over 3055335.35 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:37:00,472 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 07:37:20,979 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3842, 2.9579, 3.2870, 2.9786, 3.7300, 3.7823, 3.2750, 3.2364], device='cuda:2') 2023-11-26 07:37:21,384 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4943, 3.3668, 3.8114, 3.6717], device='cuda:2') 2023-11-26 07:37:33,032 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05803, simple_loss=0.05068, pruned_loss=0.005323, audio_tagging_loss=0.02736, over 4681554.00 frames. 2023-11-26 07:37:33,032 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 07:37:35,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.785e+01 9.392e+01 1.025e+02 1.388e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 07:37:40,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3286360.0, ans=0.125 2023-11-26 07:37:45,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3286426.6666666665, ans=0.1 2023-11-26 07:37:48,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3286426.6666666665, ans=0.125 2023-11-26 07:37:53,594 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:38:28,136 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 0, loss[loss=0.07136, simple_loss=0.07042, pruned_loss=0.01114, audio_tagging_loss=0.025, over 16824.00 frames. ], tot_loss[loss=0.07136, simple_loss=0.07042, pruned_loss=0.01114, audio_tagging_loss=0.025, over 16824.00 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:38:28,136 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 07:38:41,660 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7773, 3.2588, 4.5385, 3.4608], device='cuda:2') 2023-11-26 07:38:59,441 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05791, simple_loss=0.05064, pruned_loss=0.005256, audio_tagging_loss=0.02733, over 4681554.00 frames. 2023-11-26 07:38:59,441 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 07:39:01,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3286513.3333333335, ans=10.0 2023-11-26 07:39:01,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3286513.3333333335, ans=0.95 2023-11-26 07:39:08,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3286513.3333333335, ans=0.5 2023-11-26 07:39:23,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493000 2023-11-26 07:39:40,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3286713.3333333335, ans=0.035 2023-11-26 07:39:45,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=12.0 2023-11-26 07:39:51,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2023-11-26 07:39:54,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3286780.0, ans=0.125 2023-11-26 07:39:56,093 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 50, loss[loss=0.07116, simple_loss=0.08859, pruned_loss=0.01382, audio_tagging_loss=0.01304, over 14412.00 frames. ], tot_loss[loss=0.07298, simple_loss=0.08809, pruned_loss=0.01192, audio_tagging_loss=0.01701, over 684411.99 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:40:01,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3286846.6666666665, ans=0.09899494936611666 2023-11-26 07:40:17,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3286980.0, ans=0.0 2023-11-26 07:40:19,850 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493050 2023-11-26 07:40:22,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3286980.0, ans=0.125 2023-11-26 07:40:26,396 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.17 vs. limit=10.0 2023-11-26 07:40:29,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 9.630e+01 1.022e+02 1.088e+02 1.448e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-26 07:40:45,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3287113.3333333335, ans=0.0 2023-11-26 07:40:52,962 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 100, loss[loss=0.09252, simple_loss=0.1365, pruned_loss=0.01696, audio_tagging_loss=0.007293, over 15551.00 frames. ], tot_loss[loss=0.07327, simple_loss=0.0894, pruned_loss=0.01239, audio_tagging_loss=0.01618, over 1202746.72 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:41:04,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2023-11-26 07:41:09,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3287246.6666666665, ans=0.0 2023-11-26 07:41:16,409 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493100 2023-11-26 07:41:21,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3287313.3333333335, ans=0.1 2023-11-26 07:41:33,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3287380.0, ans=0.0 2023-11-26 07:41:35,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-26 07:41:41,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2023-11-26 07:41:44,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3287446.6666666665, ans=0.5 2023-11-26 07:41:48,979 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 150, loss[loss=0.07687, simple_loss=0.1018, pruned_loss=0.01315, audio_tagging_loss=0.01283, over 16135.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.08937, pruned_loss=0.01247, audio_tagging_loss=0.01462, over 1613899.78 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:41:53,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3287513.3333333335, ans=0.125 2023-11-26 07:42:00,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3287580.0, ans=0.2 2023-11-26 07:42:13,184 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493150 2023-11-26 07:42:14,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3287646.6666666665, ans=0.0 2023-11-26 07:42:21,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 9.081e+01 9.641e+01 1.033e+02 1.343e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 07:42:44,911 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 200, loss[loss=0.0821, simple_loss=0.1056, pruned_loss=0.01827, audio_tagging_loss=0.01105, over 15955.00 frames. ], tot_loss[loss=0.07084, simple_loss=0.09066, pruned_loss=0.01267, audio_tagging_loss=0.01284, over 1934502.78 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:42:52,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2023-11-26 07:42:58,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3287913.3333333335, ans=0.04949747468305833 2023-11-26 07:43:08,493 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493200 2023-11-26 07:43:17,402 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-26 07:43:25,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3288046.6666666665, ans=0.1 2023-11-26 07:43:41,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3288180.0, ans=0.1 2023-11-26 07:43:41,874 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 250, loss[loss=0.06955, simple_loss=0.09203, pruned_loss=0.01146, audio_tagging_loss=0.01207, over 13935.00 frames. ], tot_loss[loss=0.07004, simple_loss=0.09148, pruned_loss=0.01276, audio_tagging_loss=0.01154, over 2182383.92 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:43:44,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3288180.0, ans=0.2 2023-11-26 07:43:53,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3288246.6666666665, ans=0.1 2023-11-26 07:44:01,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3288246.6666666665, ans=0.04949747468305833 2023-11-26 07:44:04,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-26 07:44:05,386 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493250 2023-11-26 07:44:07,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3288313.3333333335, ans=0.0 2023-11-26 07:44:14,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.741e+01 9.429e+01 1.027e+02 1.277e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 07:44:19,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3288380.0, ans=0.125 2023-11-26 07:44:23,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3288380.0, ans=0.125 2023-11-26 07:44:32,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3288446.6666666665, ans=0.0 2023-11-26 07:44:37,410 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 300, loss[loss=0.06796, simple_loss=0.09245, pruned_loss=0.01374, audio_tagging_loss=0.007999, over 14540.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09134, pruned_loss=0.01285, audio_tagging_loss=0.01069, over 2368121.52 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:44:39,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3288513.3333333335, ans=0.1 2023-11-26 07:45:00,699 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493300 2023-11-26 07:45:26,446 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:45:32,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3288846.6666666665, ans=0.125 2023-11-26 07:45:33,045 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 350, loss[loss=0.06319, simple_loss=0.08398, pruned_loss=0.01256, audio_tagging_loss=0.008642, over 14541.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09038, pruned_loss=0.01264, audio_tagging_loss=0.01013, over 2519326.68 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:45:37,599 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:45:50,601 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 07:45:56,624 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493350 2023-11-26 07:46:00,270 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-11-26 07:46:06,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.599e+01 9.325e+01 1.001e+02 1.376e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 07:46:19,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-26 07:46:24,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-26 07:46:29,005 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 400, loss[loss=0.07221, simple_loss=0.1036, pruned_loss=0.01332, audio_tagging_loss=0.007112, over 14410.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.08928, pruned_loss=0.01238, audio_tagging_loss=0.009889, over 2634503.32 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:46:52,939 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493400 2023-11-26 07:46:55,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3289313.3333333335, ans=0.125 2023-11-26 07:47:02,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3289380.0, ans=0.0 2023-11-26 07:47:09,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3289380.0, ans=0.2 2023-11-26 07:47:14,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3289446.6666666665, ans=0.09899494936611666 2023-11-26 07:47:25,003 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 450, loss[loss=0.05891, simple_loss=0.07968, pruned_loss=0.009039, audio_tagging_loss=0.01003, over 14316.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.08968, pruned_loss=0.01243, audio_tagging_loss=0.009598, over 2727606.02 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:47:31,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3289513.3333333335, ans=0.02 2023-11-26 07:47:34,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.94 vs. limit=5.0 2023-11-26 07:47:41,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3289580.0, ans=0.125 2023-11-26 07:47:48,968 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493450 2023-11-26 07:47:59,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 8.870e+01 9.366e+01 1.009e+02 1.216e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 07:47:59,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3289713.3333333335, ans=0.0 2023-11-26 07:48:01,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3289713.3333333335, ans=0.04949747468305833 2023-11-26 07:48:04,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3289713.3333333335, ans=0.2 2023-11-26 07:48:19,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3289780.0, ans=0.125 2023-11-26 07:48:21,394 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 500, loss[loss=0.06059, simple_loss=0.07964, pruned_loss=0.01016, audio_tagging_loss=0.0106, over 16695.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08926, pruned_loss=0.01227, audio_tagging_loss=0.00949, over 2800037.97 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:48:34,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2023-11-26 07:48:35,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3289913.3333333335, ans=0.0 2023-11-26 07:48:40,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3289913.3333333335, ans=0.0 2023-11-26 07:48:41,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-26 07:48:44,916 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493500 2023-11-26 07:48:54,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3290046.6666666665, ans=0.2 2023-11-26 07:48:56,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3290046.6666666665, ans=0.0 2023-11-26 07:49:16,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3290180.0, ans=0.1 2023-11-26 07:49:17,269 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 550, loss[loss=0.07174, simple_loss=0.1002, pruned_loss=0.01394, audio_tagging_loss=0.007726, over 14160.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08952, pruned_loss=0.01237, audio_tagging_loss=0.009334, over 2859601.09 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:49:33,046 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:49:40,756 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493550 2023-11-26 07:49:51,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.897e+01 9.489e+01 1.022e+02 1.296e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 07:50:00,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3290380.0, ans=0.05 2023-11-26 07:50:13,148 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 600, loss[loss=0.06604, simple_loss=0.0938, pruned_loss=0.00985, audio_tagging_loss=0.009296, over 14787.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09048, pruned_loss=0.01244, audio_tagging_loss=0.009091, over 2903881.61 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:50:19,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2023-11-26 07:50:23,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3290580.0, ans=0.1 2023-11-26 07:50:36,771 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493600 2023-11-26 07:50:38,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3290646.6666666665, ans=0.125 2023-11-26 07:50:41,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3290646.6666666665, ans=0.1 2023-11-26 07:50:55,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3290713.3333333335, ans=0.2 2023-11-26 07:51:08,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3290846.6666666665, ans=0.2 2023-11-26 07:51:09,746 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 650, loss[loss=0.06003, simple_loss=0.07156, pruned_loss=0.01108, audio_tagging_loss=0.01317, over 14911.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09057, pruned_loss=0.01254, audio_tagging_loss=0.00904, over 2938885.37 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:51:14,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3290846.6666666665, ans=0.0 2023-11-26 07:51:21,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3290913.3333333335, ans=0.2 2023-11-26 07:51:31,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3290980.0, ans=0.1 2023-11-26 07:51:32,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493650 2023-11-26 07:51:34,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2023-11-26 07:51:40,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3290980.0, ans=0.0 2023-11-26 07:51:41,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3291046.6666666665, ans=0.0 2023-11-26 07:51:44,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.602e+01 9.245e+01 1.014e+02 1.320e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 07:51:47,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3291046.6666666665, ans=0.0 2023-11-26 07:51:55,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3291113.3333333335, ans=0.125 2023-11-26 07:52:05,568 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 700, loss[loss=0.05977, simple_loss=0.07306, pruned_loss=0.0137, audio_tagging_loss=0.009544, over 15216.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09019, pruned_loss=0.01254, audio_tagging_loss=0.008966, over 2964513.97 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:52:06,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3291180.0, ans=0.125 2023-11-26 07:52:18,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-26 07:52:19,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=15.0 2023-11-26 07:52:21,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2023-11-26 07:52:29,160 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493700 2023-11-26 07:52:46,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3291380.0, ans=0.0 2023-11-26 07:52:52,714 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:52:54,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291446.6666666665, ans=0.1 2023-11-26 07:52:56,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3291446.6666666665, ans=0.0 2023-11-26 07:53:01,092 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 750, loss[loss=0.06638, simple_loss=0.08977, pruned_loss=0.01177, audio_tagging_loss=0.009722, over 14867.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09018, pruned_loss=0.01251, audio_tagging_loss=0.009009, over 2982065.65 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:53:21,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3291580.0, ans=0.125 2023-11-26 07:53:25,236 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493750 2023-11-26 07:53:31,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3291646.6666666665, ans=0.125 2023-11-26 07:53:36,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.579e+01 9.292e+01 9.836e+01 1.327e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 07:53:43,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3291713.3333333335, ans=0.1 2023-11-26 07:53:44,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3291713.3333333335, ans=0.2 2023-11-26 07:53:47,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3291780.0, ans=0.125 2023-11-26 07:53:58,250 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 800, loss[loss=0.05412, simple_loss=0.06748, pruned_loss=0.008422, audio_tagging_loss=0.01196, over 14858.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09133, pruned_loss=0.01263, audio_tagging_loss=0.009017, over 3009298.88 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:54:01,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3291846.6666666665, ans=0.125 2023-11-26 07:54:01,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2023-11-26 07:54:03,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.71 vs. limit=22.5 2023-11-26 07:54:07,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3291913.3333333335, ans=0.0 2023-11-26 07:54:20,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3291980.0, ans=0.07 2023-11-26 07:54:21,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493800 2023-11-26 07:54:45,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3292113.3333333335, ans=0.125 2023-11-26 07:54:53,895 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 850, loss[loss=0.07112, simple_loss=0.09603, pruned_loss=0.0146, audio_tagging_loss=0.008504, over 15009.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09083, pruned_loss=0.01277, audio_tagging_loss=0.009116, over 3018400.60 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:55:05,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-26 07:55:16,675 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493850 2023-11-26 07:55:27,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-26 07:55:29,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.678e+01 9.372e+01 1.019e+02 1.445e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 07:55:29,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3292380.0, ans=0.0 2023-11-26 07:55:42,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2023-11-26 07:55:48,978 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 900, loss[loss=0.07165, simple_loss=0.09062, pruned_loss=0.01242, audio_tagging_loss=0.01391, over 14813.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09009, pruned_loss=0.0126, audio_tagging_loss=0.009235, over 3021591.46 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:55:56,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3292513.3333333335, ans=0.1 2023-11-26 07:56:12,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3292646.6666666665, ans=0.0 2023-11-26 07:56:13,194 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493900 2023-11-26 07:56:24,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3292713.3333333335, ans=0.125 2023-11-26 07:56:26,000 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2023-11-26 07:56:29,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3292713.3333333335, ans=0.0 2023-11-26 07:56:30,202 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.64 vs. limit=22.5 2023-11-26 07:56:31,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3292713.3333333335, ans=0.125 2023-11-26 07:56:34,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3292780.0, ans=0.2 2023-11-26 07:56:45,001 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 950, loss[loss=0.05174, simple_loss=0.07613, pruned_loss=0.005476, audio_tagging_loss=0.0082, over 15815.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09116, pruned_loss=0.01258, audio_tagging_loss=0.009014, over 3028308.54 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:56:47,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-26 07:57:09,233 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493950 2023-11-26 07:57:13,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3292980.0, ans=0.1 2023-11-26 07:57:13,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3292980.0, ans=0.125 2023-11-26 07:57:17,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3293046.6666666665, ans=0.125 2023-11-26 07:57:20,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.738e+01 9.325e+01 9.888e+01 1.254e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 07:57:39,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3293113.3333333335, ans=0.125 2023-11-26 07:57:41,285 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1000, loss[loss=0.06729, simple_loss=0.09567, pruned_loss=0.01029, audio_tagging_loss=0.009166, over 16537.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09061, pruned_loss=0.01256, audio_tagging_loss=0.008882, over 3022415.25 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:57:42,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3293180.0, ans=0.2 2023-11-26 07:57:49,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3293180.0, ans=0.2 2023-11-26 07:57:57,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3293246.6666666665, ans=0.125 2023-11-26 07:58:04,332 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:58:04,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494000 2023-11-26 07:58:05,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.75 vs. limit=12.0 2023-11-26 07:58:22,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3293380.0, ans=0.125 2023-11-26 07:58:27,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3293446.6666666665, ans=0.125 2023-11-26 07:58:37,591 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1050, loss[loss=0.05847, simple_loss=0.08219, pruned_loss=0.008192, audio_tagging_loss=0.009187, over 16910.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08976, pruned_loss=0.01244, audio_tagging_loss=0.008792, over 3025394.36 frames. ], batch size: 65, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:58:41,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3293513.3333333335, ans=0.2 2023-11-26 07:58:47,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3293580.0, ans=0.0 2023-11-26 07:58:57,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293580.0, ans=0.1 2023-11-26 07:59:01,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494050 2023-11-26 07:59:02,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3293646.6666666665, ans=0.2 2023-11-26 07:59:13,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.710e+01 9.431e+01 1.020e+02 1.408e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 07:59:14,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3293713.3333333335, ans=0.125 2023-11-26 07:59:26,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3293780.0, ans=0.125 2023-11-26 07:59:27,299 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-11-26 07:59:33,781 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1100, loss[loss=0.07096, simple_loss=0.1065, pruned_loss=0.01143, audio_tagging_loss=0.006267, over 14617.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08971, pruned_loss=0.01233, audio_tagging_loss=0.008654, over 3029977.13 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:59:34,089 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:59:36,063 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:59:37,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3293846.6666666665, ans=0.0 2023-11-26 07:59:38,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3293846.6666666665, ans=0.0 2023-11-26 07:59:58,036 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494100 2023-11-26 08:00:08,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3294046.6666666665, ans=0.025 2023-11-26 08:00:20,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3294113.3333333335, ans=0.125 2023-11-26 08:00:30,392 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1150, loss[loss=0.05164, simple_loss=0.06472, pruned_loss=0.0073, audio_tagging_loss=0.01198, over 15993.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08869, pruned_loss=0.01218, audio_tagging_loss=0.008716, over 3036327.83 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:00:43,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.82 vs. limit=22.5 2023-11-26 08:00:45,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3294246.6666666665, ans=0.5 2023-11-26 08:00:53,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494150 2023-11-26 08:00:55,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3294313.3333333335, ans=0.125 2023-11-26 08:01:03,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3294380.0, ans=0.125 2023-11-26 08:01:05,956 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.571e+01 9.145e+01 9.893e+01 1.532e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-26 08:01:26,211 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1200, loss[loss=0.06632, simple_loss=0.09086, pruned_loss=0.009945, audio_tagging_loss=0.01094, over 14503.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08876, pruned_loss=0.01216, audio_tagging_loss=0.008681, over 3035252.89 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:01:29,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3294513.3333333335, ans=0.2 2023-11-26 08:01:31,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3294513.3333333335, ans=0.125 2023-11-26 08:01:34,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3294513.3333333335, ans=0.1 2023-11-26 08:01:49,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494200 2023-11-26 08:02:07,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3294713.3333333335, ans=0.125 2023-11-26 08:02:21,999 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1250, loss[loss=0.06622, simple_loss=0.09137, pruned_loss=0.01227, audio_tagging_loss=0.008266, over 14405.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08844, pruned_loss=0.01216, audio_tagging_loss=0.008673, over 3045984.50 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:02:46,656 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494250 2023-11-26 08:02:53,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3294980.0, ans=0.1 2023-11-26 08:02:58,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.635e+01 9.244e+01 9.927e+01 1.336e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 08:03:07,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295113.3333333335, ans=0.1 2023-11-26 08:03:15,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3295113.3333333335, ans=0.0 2023-11-26 08:03:16,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3295113.3333333335, ans=0.0 2023-11-26 08:03:18,633 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1300, loss[loss=0.06797, simple_loss=0.09352, pruned_loss=0.0123, audio_tagging_loss=0.008908, over 15565.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08813, pruned_loss=0.01209, audio_tagging_loss=0.00863, over 3034012.36 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:03:27,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295180.0, ans=0.1 2023-11-26 08:03:29,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3295246.6666666665, ans=0.125 2023-11-26 08:03:33,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3295246.6666666665, ans=0.125 2023-11-26 08:03:41,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.73 vs. limit=15.0 2023-11-26 08:03:42,038 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494300 2023-11-26 08:03:49,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3295313.3333333335, ans=0.0 2023-11-26 08:03:58,527 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-11-26 08:04:14,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3295513.3333333335, ans=0.125 2023-11-26 08:04:14,937 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1350, loss[loss=0.07531, simple_loss=0.1052, pruned_loss=0.01584, audio_tagging_loss=0.006859, over 15238.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08748, pruned_loss=0.0121, audio_tagging_loss=0.008698, over 3031802.18 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:04:15,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3295513.3333333335, ans=0.125 2023-11-26 08:04:16,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2023-11-26 08:04:24,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-26 08:04:38,705 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494350 2023-11-26 08:04:45,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-26 08:04:45,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-26 08:04:46,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295646.6666666665, ans=0.1 2023-11-26 08:04:52,464 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.816e+01 9.406e+01 1.018e+02 1.240e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 08:04:55,784 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:04:57,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3295713.3333333335, ans=0.0 2023-11-26 08:05:02,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2023-11-26 08:05:06,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3295780.0, ans=0.0 2023-11-26 08:05:08,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2023-11-26 08:05:10,825 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1400, loss[loss=0.06154, simple_loss=0.08568, pruned_loss=0.01115, audio_tagging_loss=0.007557, over 16554.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08769, pruned_loss=0.01218, audio_tagging_loss=0.008798, over 3040789.46 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:05:10,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3295846.6666666665, ans=0.1 2023-11-26 08:05:16,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-26 08:05:22,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3295913.3333333335, ans=0.0 2023-11-26 08:05:30,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3295913.3333333335, ans=0.125 2023-11-26 08:05:34,863 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494400 2023-11-26 08:05:34,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3295980.0, ans=0.125 2023-11-26 08:06:07,635 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1450, loss[loss=0.06562, simple_loss=0.08645, pruned_loss=0.01207, audio_tagging_loss=0.01032, over 14849.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08862, pruned_loss=0.0124, audio_tagging_loss=0.008793, over 3039585.71 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:06:13,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3296180.0, ans=0.125 2023-11-26 08:06:13,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.82 vs. limit=10.0 2023-11-26 08:06:20,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3296246.6666666665, ans=0.0 2023-11-26 08:06:31,012 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494450 2023-11-26 08:06:44,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.868e+01 9.341e+01 9.992e+01 1.188e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 08:06:45,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3296380.0, ans=0.125 2023-11-26 08:07:04,078 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1500, loss[loss=0.07732, simple_loss=0.09713, pruned_loss=0.01994, audio_tagging_loss=0.008816, over 16430.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.0894, pruned_loss=0.01267, audio_tagging_loss=0.008842, over 3042981.13 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:07:08,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3296513.3333333335, ans=0.125 2023-11-26 08:07:13,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3296580.0, ans=0.0 2023-11-26 08:07:16,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3296580.0, ans=0.0 2023-11-26 08:07:25,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3296646.6666666665, ans=0.0 2023-11-26 08:07:27,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494500 2023-11-26 08:07:53,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3296780.0, ans=0.0 2023-11-26 08:07:59,526 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1550, loss[loss=0.08374, simple_loss=0.1077, pruned_loss=0.02094, audio_tagging_loss=0.008945, over 15353.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08971, pruned_loss=0.0125, audio_tagging_loss=0.008851, over 3049182.82 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:08:04,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3296846.6666666665, ans=0.125 2023-11-26 08:08:19,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3296913.3333333335, ans=0.1 2023-11-26 08:08:22,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494550 2023-11-26 08:08:26,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3296980.0, ans=0.125 2023-11-26 08:08:32,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.23 vs. limit=10.0 2023-11-26 08:08:36,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.636e+01 9.426e+01 1.014e+02 1.319e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 08:08:40,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3297046.6666666665, ans=0.1 2023-11-26 08:08:51,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3297113.3333333335, ans=0.0 2023-11-26 08:08:55,624 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1600, loss[loss=0.05884, simple_loss=0.07707, pruned_loss=0.01333, audio_tagging_loss=0.006978, over 14851.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08997, pruned_loss=0.01251, audio_tagging_loss=0.008894, over 3045804.35 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:08:57,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3297180.0, ans=0.0 2023-11-26 08:08:58,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3297180.0, ans=0.0 2023-11-26 08:09:11,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3297246.6666666665, ans=0.125 2023-11-26 08:09:18,994 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494600 2023-11-26 08:09:51,640 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1650, loss[loss=0.05607, simple_loss=0.07459, pruned_loss=0.008812, audio_tagging_loss=0.009957, over 15558.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08916, pruned_loss=0.01225, audio_tagging_loss=0.009001, over 3053836.23 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:09:51,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3297513.3333333335, ans=0.1 2023-11-26 08:09:53,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2023-11-26 08:10:15,078 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494650 2023-11-26 08:10:15,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3297646.6666666665, ans=0.125 2023-11-26 08:10:19,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3297646.6666666665, ans=0.0 2023-11-26 08:10:28,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.801e+01 9.353e+01 1.001e+02 1.567e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 08:10:39,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3297780.0, ans=0.07 2023-11-26 08:10:41,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2023-11-26 08:10:47,515 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1700, loss[loss=0.05096, simple_loss=0.06561, pruned_loss=0.008267, audio_tagging_loss=0.009888, over 14038.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08942, pruned_loss=0.0123, audio_tagging_loss=0.008949, over 3062089.16 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:10:50,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3297846.6666666665, ans=0.0 2023-11-26 08:11:06,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3297913.3333333335, ans=0.1 2023-11-26 08:11:06,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3297913.3333333335, ans=0.125 2023-11-26 08:11:10,996 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494700 2023-11-26 08:11:16,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3297980.0, ans=0.1 2023-11-26 08:11:25,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298046.6666666665, ans=0.1 2023-11-26 08:11:31,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3298113.3333333335, ans=0.05 2023-11-26 08:11:43,205 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1750, loss[loss=0.03982, simple_loss=0.05309, pruned_loss=0.005479, audio_tagging_loss=0.007793, over 14545.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08925, pruned_loss=0.01229, audio_tagging_loss=0.008895, over 3053429.47 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:11:46,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-11-26 08:12:00,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3298246.6666666665, ans=0.125 2023-11-26 08:12:02,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3298246.6666666665, ans=0.125 2023-11-26 08:12:06,564 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494750 2023-11-26 08:12:12,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3298313.3333333335, ans=0.0 2023-11-26 08:12:14,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=15.0 2023-11-26 08:12:21,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.900e+01 9.496e+01 1.021e+02 1.422e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 08:12:25,829 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:12:26,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3298446.6666666665, ans=10.0 2023-11-26 08:12:34,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3298446.6666666665, ans=0.125 2023-11-26 08:12:39,606 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1800, loss[loss=0.0628, simple_loss=0.07802, pruned_loss=0.01589, audio_tagging_loss=0.0079, over 14962.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09044, pruned_loss=0.0125, audio_tagging_loss=0.008765, over 3057866.16 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:12:41,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3298513.3333333335, ans=0.125 2023-11-26 08:12:41,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3298513.3333333335, ans=0.125 2023-11-26 08:12:45,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3298513.3333333335, ans=0.125 2023-11-26 08:12:47,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3298513.3333333335, ans=0.125 2023-11-26 08:12:48,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3298513.3333333335, ans=0.125 2023-11-26 08:12:57,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3298580.0, ans=0.0 2023-11-26 08:12:58,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3298580.0, ans=0.95 2023-11-26 08:12:58,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298580.0, ans=0.1 2023-11-26 08:13:03,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494800 2023-11-26 08:13:09,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3298646.6666666665, ans=0.2 2023-11-26 08:13:12,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-26 08:13:23,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3298780.0, ans=0.0 2023-11-26 08:13:35,244 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1850, loss[loss=0.05349, simple_loss=0.0685, pruned_loss=0.007685, audio_tagging_loss=0.01155, over 14300.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09096, pruned_loss=0.01267, audio_tagging_loss=0.008658, over 3062618.95 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:13:43,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3298846.6666666665, ans=0.2 2023-11-26 08:13:50,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3298913.3333333335, ans=0.125 2023-11-26 08:13:59,433 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494850 2023-11-26 08:14:06,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3298980.0, ans=0.125 2023-11-26 08:14:11,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3299046.6666666665, ans=0.0 2023-11-26 08:14:14,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.832e+01 9.434e+01 1.017e+02 1.223e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 08:14:15,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-26 08:14:24,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3299113.3333333335, ans=0.0 2023-11-26 08:14:27,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3299113.3333333335, ans=0.0 2023-11-26 08:14:31,939 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1900, loss[loss=0.06328, simple_loss=0.0841, pruned_loss=0.009963, audio_tagging_loss=0.01127, over 15963.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09136, pruned_loss=0.01272, audio_tagging_loss=0.008642, over 3058498.84 frames. ], batch size: 61, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:14:43,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-26 08:14:47,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-26 08:14:51,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-26 08:14:55,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494900 2023-11-26 08:15:15,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3299446.6666666665, ans=0.125 2023-11-26 08:15:20,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-26 08:15:26,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.73 vs. limit=5.0 2023-11-26 08:15:26,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-11-26 08:15:27,391 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1950, loss[loss=0.07156, simple_loss=0.1012, pruned_loss=0.01422, audio_tagging_loss=0.006744, over 15567.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.0904, pruned_loss=0.01253, audio_tagging_loss=0.008627, over 3053640.24 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:15:51,030 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494950 2023-11-26 08:16:06,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.592e+01 9.452e+01 9.958e+01 1.219e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 08:16:16,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3299780.0, ans=0.09899494936611666 2023-11-26 08:16:17,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3299780.0, ans=0.125 2023-11-26 08:16:21,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2023-11-26 08:16:23,456 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2000, loss[loss=0.06958, simple_loss=0.09364, pruned_loss=0.01496, audio_tagging_loss=0.007807, over 15046.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09002, pruned_loss=0.0125, audio_tagging_loss=0.008657, over 3054185.14 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:16:24,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3299846.6666666665, ans=0.125 2023-11-26 08:16:45,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3299980.0, ans=0.125 2023-11-26 08:16:47,414 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495000 2023-11-26 08:17:02,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-26 08:17:18,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3300180.0, ans=0.125 2023-11-26 08:17:19,873 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2050, loss[loss=0.07626, simple_loss=0.09892, pruned_loss=0.01703, audio_tagging_loss=0.009765, over 16656.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08991, pruned_loss=0.01257, audio_tagging_loss=0.008734, over 3050191.68 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:17:38,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3300246.6666666665, ans=0.0 2023-11-26 08:17:43,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495050 2023-11-26 08:17:49,204 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2023-11-26 08:17:58,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 8.680e+01 9.276e+01 1.017e+02 1.208e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 08:18:01,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-26 08:18:16,045 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2100, loss[loss=0.05683, simple_loss=0.07833, pruned_loss=0.009531, audio_tagging_loss=0.008135, over 15529.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08996, pruned_loss=0.01241, audio_tagging_loss=0.008709, over 3057210.03 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:18:39,012 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495100 2023-11-26 08:18:42,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3300646.6666666665, ans=0.125 2023-11-26 08:18:42,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=15.0 2023-11-26 08:18:51,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-11-26 08:19:06,202 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-26 08:19:11,872 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2150, loss[loss=0.06836, simple_loss=0.09491, pruned_loss=0.01118, audio_tagging_loss=0.009726, over 14824.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09062, pruned_loss=0.01253, audio_tagging_loss=0.008707, over 3054126.81 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:19:16,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3300846.6666666665, ans=0.0 2023-11-26 08:19:23,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3300913.3333333335, ans=0.2 2023-11-26 08:19:35,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495150 2023-11-26 08:19:44,893 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:19:49,343 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:19:51,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.871e+01 9.357e+01 1.023e+02 1.211e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 08:19:52,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3301046.6666666665, ans=0.125 2023-11-26 08:19:54,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2023-11-26 08:19:54,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2023-11-26 08:20:07,212 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2200, loss[loss=0.07604, simple_loss=0.1095, pruned_loss=0.0149, audio_tagging_loss=0.006399, over 15817.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09166, pruned_loss=0.01278, audio_tagging_loss=0.008669, over 3055462.94 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:20:19,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3301246.6666666665, ans=0.0 2023-11-26 08:20:28,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3301246.6666666665, ans=0.1 2023-11-26 08:20:31,784 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495200 2023-11-26 08:20:35,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3301313.3333333335, ans=0.125 2023-11-26 08:20:58,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3301446.6666666665, ans=10.0 2023-11-26 08:21:04,552 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2250, loss[loss=0.08012, simple_loss=0.1086, pruned_loss=0.01742, audio_tagging_loss=0.008379, over 15285.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09111, pruned_loss=0.01281, audio_tagging_loss=0.008711, over 3053373.44 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:21:08,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3301513.3333333335, ans=0.125 2023-11-26 08:21:27,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495250 2023-11-26 08:21:33,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2023-11-26 08:21:43,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.823e+01 9.427e+01 1.035e+02 1.716e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 08:22:00,201 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2300, loss[loss=0.06893, simple_loss=0.09652, pruned_loss=0.0122, audio_tagging_loss=0.008473, over 14501.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09009, pruned_loss=0.01254, audio_tagging_loss=0.008795, over 3044170.74 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:22:15,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3301913.3333333335, ans=0.1 2023-11-26 08:22:23,236 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495300 2023-11-26 08:22:23,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3301980.0, ans=0.0 2023-11-26 08:22:28,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301980.0, ans=0.1 2023-11-26 08:22:46,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3302113.3333333335, ans=0.2 2023-11-26 08:22:47,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3302113.3333333335, ans=0.025 2023-11-26 08:22:48,094 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:22:53,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3302113.3333333335, ans=0.0 2023-11-26 08:22:55,589 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2350, loss[loss=0.06923, simple_loss=0.09553, pruned_loss=0.01197, audio_tagging_loss=0.009495, over 15777.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09071, pruned_loss=0.01273, audio_tagging_loss=0.008821, over 3046335.38 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:22:56,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3302180.0, ans=0.09899494936611666 2023-11-26 08:23:02,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3302180.0, ans=0.0 2023-11-26 08:23:06,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3302246.6666666665, ans=0.125 2023-11-26 08:23:06,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3302246.6666666665, ans=0.2 2023-11-26 08:23:20,229 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495350 2023-11-26 08:23:34,835 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.737e+01 9.480e+01 1.014e+02 1.457e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 08:23:51,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3302513.3333333335, ans=0.125 2023-11-26 08:23:51,964 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2400, loss[loss=0.0604, simple_loss=0.07706, pruned_loss=0.009359, audio_tagging_loss=0.01252, over 15719.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08958, pruned_loss=0.0126, audio_tagging_loss=0.009041, over 3042019.95 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:24:15,473 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495400 2023-11-26 08:24:47,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3302846.6666666665, ans=0.1 2023-11-26 08:24:48,735 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2450, loss[loss=0.0531, simple_loss=0.06864, pruned_loss=0.007817, audio_tagging_loss=0.01096, over 14684.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08936, pruned_loss=0.01233, audio_tagging_loss=0.009198, over 3044690.39 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:24:51,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3302846.6666666665, ans=0.0 2023-11-26 08:24:55,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3302846.6666666665, ans=0.1 2023-11-26 08:25:09,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3302980.0, ans=0.04949747468305833 2023-11-26 08:25:11,602 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495450 2023-11-26 08:25:23,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3303046.6666666665, ans=0.0 2023-11-26 08:25:28,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 8.728e+01 9.406e+01 1.027e+02 1.574e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 08:25:32,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2023-11-26 08:25:43,707 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2500, loss[loss=0.05228, simple_loss=0.06219, pruned_loss=0.006488, audio_tagging_loss=0.0147, over 14592.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0901, pruned_loss=0.01245, audio_tagging_loss=0.009162, over 3044584.57 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:25:45,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3303180.0, ans=0.125 2023-11-26 08:26:02,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3303246.6666666665, ans=0.0 2023-11-26 08:26:07,794 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495500 2023-11-26 08:26:10,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3303313.3333333335, ans=0.125 2023-11-26 08:26:31,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3303446.6666666665, ans=10.0 2023-11-26 08:26:39,655 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2550, loss[loss=0.08074, simple_loss=0.1091, pruned_loss=0.01873, audio_tagging_loss=0.007461, over 16041.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.0907, pruned_loss=0.01259, audio_tagging_loss=0.009041, over 3046574.57 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:27:03,232 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495550 2023-11-26 08:27:05,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3303646.6666666665, ans=0.0 2023-11-26 08:27:18,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-11-26 08:27:19,457 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.568e+01 9.109e+01 1.007e+02 1.472e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 08:27:35,776 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2600, loss[loss=0.06075, simple_loss=0.07949, pruned_loss=0.01252, audio_tagging_loss=0.008493, over 14888.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09093, pruned_loss=0.0126, audio_tagging_loss=0.008856, over 3055078.14 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:27:45,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3303913.3333333335, ans=0.2 2023-11-26 08:27:58,811 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495600 2023-11-26 08:28:30,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2023-11-26 08:28:31,459 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2650, loss[loss=0.06659, simple_loss=0.09717, pruned_loss=0.009373, audio_tagging_loss=0.008635, over 14923.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0905, pruned_loss=0.01256, audio_tagging_loss=0.008837, over 3052607.80 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:28:41,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3304246.6666666665, ans=10.0 2023-11-26 08:28:54,921 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495650 2023-11-26 08:28:55,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3304313.3333333335, ans=0.125 2023-11-26 08:29:05,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3304380.0, ans=0.125 2023-11-26 08:29:11,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3304380.0, ans=0.125 2023-11-26 08:29:11,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.718e+01 9.187e+01 9.929e+01 1.273e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 08:29:16,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3304446.6666666665, ans=0.125 2023-11-26 08:29:17,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3304446.6666666665, ans=0.125 2023-11-26 08:29:27,433 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2700, loss[loss=0.06943, simple_loss=0.1017, pruned_loss=0.01163, audio_tagging_loss=0.006936, over 15308.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08991, pruned_loss=0.01252, audio_tagging_loss=0.008886, over 3052760.05 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:29:28,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3304513.3333333335, ans=0.0 2023-11-26 08:29:39,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3304580.0, ans=0.2 2023-11-26 08:29:51,631 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495700 2023-11-26 08:29:58,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3304646.6666666665, ans=0.125 2023-11-26 08:30:01,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3304713.3333333335, ans=0.125 2023-11-26 08:30:01,592 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.97 vs. limit=10.0 2023-11-26 08:30:13,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=22.5 2023-11-26 08:30:17,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3304780.0, ans=0.1 2023-11-26 08:30:17,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3304780.0, ans=0.07 2023-11-26 08:30:23,770 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2750, loss[loss=0.06743, simple_loss=0.08484, pruned_loss=0.01634, audio_tagging_loss=0.00867, over 14737.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09022, pruned_loss=0.01272, audio_tagging_loss=0.008815, over 3047467.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:30:34,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3304913.3333333335, ans=0.1 2023-11-26 08:30:40,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3304913.3333333335, ans=0.1 2023-11-26 08:30:46,670 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495750 2023-11-26 08:30:51,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3304980.0, ans=0.1 2023-11-26 08:31:03,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.804e+01 8.931e+01 9.557e+01 1.024e+02 1.484e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 08:31:10,387 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:31:19,452 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2800, loss[loss=0.07516, simple_loss=0.09493, pruned_loss=0.01464, audio_tagging_loss=0.01306, over 15560.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09062, pruned_loss=0.01273, audio_tagging_loss=0.008809, over 3047507.07 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:31:19,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3305180.0, ans=0.125 2023-11-26 08:31:39,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3305246.6666666665, ans=0.125 2023-11-26 08:31:43,130 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495800 2023-11-26 08:31:46,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3305313.3333333335, ans=0.1 2023-11-26 08:31:58,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3305380.0, ans=0.1 2023-11-26 08:32:00,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3305380.0, ans=0.125 2023-11-26 08:32:01,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3305380.0, ans=0.125 2023-11-26 08:32:02,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3305380.0, ans=0.125 2023-11-26 08:32:08,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3305446.6666666665, ans=0.0 2023-11-26 08:32:15,822 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2850, loss[loss=0.05871, simple_loss=0.07544, pruned_loss=0.008164, audio_tagging_loss=0.01282, over 15574.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09017, pruned_loss=0.01275, audio_tagging_loss=0.008828, over 3046178.74 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:32:34,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3305580.0, ans=0.125 2023-11-26 08:32:39,490 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495850 2023-11-26 08:32:46,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3305646.6666666665, ans=0.125 2023-11-26 08:32:55,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.658e+01 9.306e+01 1.021e+02 1.225e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 08:33:04,635 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=22.5 2023-11-26 08:33:11,951 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2900, loss[loss=0.06258, simple_loss=0.0838, pruned_loss=0.01057, audio_tagging_loss=0.0101, over 14946.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08981, pruned_loss=0.01262, audio_tagging_loss=0.008755, over 3046315.73 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:33:13,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3305846.6666666665, ans=0.125 2023-11-26 08:33:24,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2023-11-26 08:33:35,380 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495900 2023-11-26 08:33:48,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3306046.6666666665, ans=0.04949747468305833 2023-11-26 08:34:07,731 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2950, loss[loss=0.04916, simple_loss=0.05922, pruned_loss=0.008882, audio_tagging_loss=0.01067, over 13652.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08998, pruned_loss=0.01252, audio_tagging_loss=0.008755, over 3050870.58 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:34:27,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3306246.6666666665, ans=0.125 2023-11-26 08:34:28,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3306246.6666666665, ans=0.125 2023-11-26 08:34:31,566 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495950 2023-11-26 08:34:40,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3306380.0, ans=0.125 2023-11-26 08:34:46,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-11-26 08:34:47,999 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.833e+01 9.371e+01 1.025e+02 1.490e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 08:35:01,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-26 08:35:03,556 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3000, loss[loss=0.06648, simple_loss=0.09761, pruned_loss=0.01072, audio_tagging_loss=0.006959, over 15187.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09032, pruned_loss=0.01239, audio_tagging_loss=0.008755, over 3050377.08 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:35:03,556 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 08:35:36,317 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05776, simple_loss=0.05062, pruned_loss=0.005203, audio_tagging_loss=0.02725, over 4681554.00 frames. 2023-11-26 08:35:36,318 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 08:35:38,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-26 08:35:44,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3306513.3333333335, ans=0.0 2023-11-26 08:35:58,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3306646.6666666665, ans=0.2 2023-11-26 08:35:59,060 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496000 2023-11-26 08:36:08,827 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-26 08:36:10,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3306713.3333333335, ans=0.2 2023-11-26 08:36:26,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2023-11-26 08:36:30,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2023-11-26 08:36:33,628 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3050, loss[loss=0.07837, simple_loss=0.1024, pruned_loss=0.01575, audio_tagging_loss=0.01144, over 15564.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0906, pruned_loss=0.01248, audio_tagging_loss=0.008892, over 3044575.08 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:36:43,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3306913.3333333335, ans=0.2 2023-11-26 08:36:50,815 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:36:54,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3306913.3333333335, ans=0.2 2023-11-26 08:36:57,590 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496050 2023-11-26 08:37:05,387 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:37:07,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2023-11-26 08:37:07,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3307046.6666666665, ans=0.1 2023-11-26 08:37:13,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.651e+01 9.305e+01 1.008e+02 1.239e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 08:37:26,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3307113.3333333335, ans=0.0 2023-11-26 08:37:26,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3307113.3333333335, ans=0.0 2023-11-26 08:37:29,291 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3100, loss[loss=0.06301, simple_loss=0.07969, pruned_loss=0.01249, audio_tagging_loss=0.01068, over 15340.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08963, pruned_loss=0.01223, audio_tagging_loss=0.008924, over 3043128.72 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:37:32,619 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-26 08:37:42,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3307246.6666666665, ans=0.125 2023-11-26 08:37:51,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=12.0 2023-11-26 08:37:53,478 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496100 2023-11-26 08:38:02,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3307380.0, ans=0.125 2023-11-26 08:38:05,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3307380.0, ans=0.125 2023-11-26 08:38:06,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3307380.0, ans=0.1 2023-11-26 08:38:25,848 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3150, loss[loss=0.08701, simple_loss=0.1173, pruned_loss=0.02026, audio_tagging_loss=0.008093, over 15607.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.0905, pruned_loss=0.01231, audio_tagging_loss=0.008993, over 3043193.46 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:38:27,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3307513.3333333335, ans=0.0 2023-11-26 08:38:49,553 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496150 2023-11-26 08:38:49,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3307646.6666666665, ans=0.0 2023-11-26 08:38:50,111 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-26 08:39:06,519 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 8.861e+01 9.326e+01 1.004e+02 1.383e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 08:39:12,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3307780.0, ans=0.0 2023-11-26 08:39:19,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3307780.0, ans=0.0 2023-11-26 08:39:22,073 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3200, loss[loss=0.0655, simple_loss=0.09189, pruned_loss=0.01245, audio_tagging_loss=0.007102, over 14492.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09064, pruned_loss=0.01237, audio_tagging_loss=0.009043, over 3042351.49 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:39:24,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3307846.6666666665, ans=0.1 2023-11-26 08:39:45,671 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496200 2023-11-26 08:39:50,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.17 vs. limit=10.0 2023-11-26 08:39:51,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3307980.0, ans=0.125 2023-11-26 08:40:10,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3308113.3333333335, ans=0.0 2023-11-26 08:40:18,472 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3250, loss[loss=0.06946, simple_loss=0.08942, pruned_loss=0.01299, audio_tagging_loss=0.01176, over 15063.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08986, pruned_loss=0.01231, audio_tagging_loss=0.009151, over 3051264.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:40:30,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3308246.6666666665, ans=0.07 2023-11-26 08:40:42,234 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496250 2023-11-26 08:40:42,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.48 vs. limit=10.0 2023-11-26 08:40:52,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2023-11-26 08:40:58,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.751e+01 9.386e+01 1.020e+02 1.370e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 08:41:14,547 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3300, loss[loss=0.07598, simple_loss=0.1042, pruned_loss=0.01606, audio_tagging_loss=0.007806, over 15086.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08988, pruned_loss=0.01224, audio_tagging_loss=0.009127, over 3044907.06 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:41:22,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3308513.3333333335, ans=0.125 2023-11-26 08:41:37,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496300 2023-11-26 08:41:45,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2023-11-26 08:41:48,574 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-11-26 08:42:00,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=22.5 2023-11-26 08:42:02,463 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-26 08:42:03,638 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=15.0 2023-11-26 08:42:10,607 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3350, loss[loss=0.09438, simple_loss=0.1298, pruned_loss=0.01937, audio_tagging_loss=0.01009, over 16111.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09097, pruned_loss=0.01246, audio_tagging_loss=0.008996, over 3037775.80 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:42:17,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3308846.6666666665, ans=0.2 2023-11-26 08:42:27,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3308913.3333333335, ans=0.1 2023-11-26 08:42:33,900 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496350 2023-11-26 08:42:42,794 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2023-11-26 08:42:50,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 8.810e+01 9.666e+01 1.064e+02 1.433e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 08:43:05,515 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3400, loss[loss=0.04795, simple_loss=0.07299, pruned_loss=0.005511, audio_tagging_loss=0.00594, over 15173.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09065, pruned_loss=0.0125, audio_tagging_loss=0.008928, over 3042084.37 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:43:16,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3309246.6666666665, ans=0.2 2023-11-26 08:43:19,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3309246.6666666665, ans=0.0 2023-11-26 08:43:29,289 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496400 2023-11-26 08:44:00,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3309513.3333333335, ans=0.5 2023-11-26 08:44:01,841 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3450, loss[loss=0.07432, simple_loss=0.1054, pruned_loss=0.01362, audio_tagging_loss=0.008017, over 15982.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09097, pruned_loss=0.01249, audio_tagging_loss=0.008832, over 3045151.30 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:44:24,933 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496450 2023-11-26 08:44:29,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3309646.6666666665, ans=0.1 2023-11-26 08:44:41,766 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.832e+01 9.547e+01 1.006e+02 1.211e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 08:44:45,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3309780.0, ans=0.125 2023-11-26 08:44:46,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.90 vs. limit=22.5 2023-11-26 08:44:49,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309780.0, ans=0.1 2023-11-26 08:44:57,755 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3500, loss[loss=0.06317, simple_loss=0.08709, pruned_loss=0.01049, audio_tagging_loss=0.009132, over 15326.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09162, pruned_loss=0.01264, audio_tagging_loss=0.008757, over 3050165.52 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:45:00,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2023-11-26 08:45:12,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3309913.3333333335, ans=0.125 2023-11-26 08:45:20,660 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496500 2023-11-26 08:45:25,958 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:45:36,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3310046.6666666665, ans=0.1 2023-11-26 08:45:53,132 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3550, loss[loss=0.04625, simple_loss=0.05753, pruned_loss=0.006985, audio_tagging_loss=0.0105, over 14460.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09023, pruned_loss=0.0125, audio_tagging_loss=0.008792, over 3045264.21 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:46:01,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3310180.0, ans=0.0 2023-11-26 08:46:09,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3310246.6666666665, ans=0.125 2023-11-26 08:46:11,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3310246.6666666665, ans=0.0 2023-11-26 08:46:16,869 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496550 2023-11-26 08:46:23,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3310313.3333333335, ans=0.125 2023-11-26 08:46:33,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.065e+01 8.432e+01 9.183e+01 9.736e+01 1.809e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 08:46:48,259 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3600, loss[loss=0.07081, simple_loss=0.0995, pruned_loss=0.01389, audio_tagging_loss=0.007175, over 15442.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08939, pruned_loss=0.01223, audio_tagging_loss=0.008707, over 3043271.81 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:47:12,057 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496600 2023-11-26 08:47:22,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-11-26 08:47:33,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3310780.0, ans=0.5 2023-11-26 08:47:45,309 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3650, loss[loss=0.0689, simple_loss=0.09153, pruned_loss=0.01338, audio_tagging_loss=0.00976, over 16187.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08948, pruned_loss=0.01225, audio_tagging_loss=0.00866, over 3043160.11 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:47:49,733 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:47:52,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3310846.6666666665, ans=0.1 2023-11-26 08:48:08,425 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496650 2023-11-26 08:48:10,711 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:48:28,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2023-11-26 08:48:29,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.589e+01 9.068e+01 9.988e+01 1.098e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-26 08:48:37,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2023-11-26 08:48:40,911 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3700, loss[loss=0.07355, simple_loss=0.1009, pruned_loss=0.01625, audio_tagging_loss=0.006864, over 14740.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09045, pruned_loss=0.01239, audio_tagging_loss=0.008609, over 3046398.95 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:48:42,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3311180.0, ans=0.125 2023-11-26 08:48:59,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311246.6666666665, ans=0.1 2023-11-26 08:49:04,827 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496700 2023-11-26 08:49:08,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311313.3333333335, ans=0.1 2023-11-26 08:49:14,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3311380.0, ans=0.1 2023-11-26 08:49:28,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311446.6666666665, ans=0.1 2023-11-26 08:49:30,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-11-26 08:49:33,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2023-11-26 08:49:34,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3311446.6666666665, ans=0.125 2023-11-26 08:49:36,479 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3750, loss[loss=0.0654, simple_loss=0.08752, pruned_loss=0.01258, audio_tagging_loss=0.009054, over 15437.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.0906, pruned_loss=0.01248, audio_tagging_loss=0.008636, over 3053537.37 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:49:46,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3311513.3333333335, ans=0.1 2023-11-26 08:49:52,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3311580.0, ans=0.0 2023-11-26 08:50:00,822 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496750 2023-11-26 08:50:06,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3311646.6666666665, ans=0.0 2023-11-26 08:50:13,667 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:50:18,783 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2023-11-26 08:50:20,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.835e+01 9.456e+01 1.002e+02 1.375e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 08:50:33,736 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3800, loss[loss=0.07384, simple_loss=0.1069, pruned_loss=0.01493, audio_tagging_loss=0.005471, over 15595.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09014, pruned_loss=0.01248, audio_tagging_loss=0.008683, over 3048550.47 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:50:52,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3311913.3333333335, ans=0.2 2023-11-26 08:50:56,193 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496800 2023-11-26 08:51:00,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2023-11-26 08:51:07,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3312046.6666666665, ans=0.0 2023-11-26 08:51:09,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3312046.6666666665, ans=0.2 2023-11-26 08:51:20,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3312113.3333333335, ans=0.125 2023-11-26 08:51:29,071 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3850, loss[loss=0.07359, simple_loss=0.1046, pruned_loss=0.01411, audio_tagging_loss=0.007175, over 15503.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09003, pruned_loss=0.01255, audio_tagging_loss=0.008768, over 3046984.90 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:51:37,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3312180.0, ans=0.0 2023-11-26 08:51:43,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.34 vs. limit=22.5 2023-11-26 08:51:52,576 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496850 2023-11-26 08:52:07,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3312380.0, ans=0.0 2023-11-26 08:52:12,780 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.753e+01 9.436e+01 1.032e+02 1.247e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 08:52:24,965 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3900, loss[loss=0.07366, simple_loss=0.08829, pruned_loss=0.01648, audio_tagging_loss=0.01303, over 15797.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08997, pruned_loss=0.01258, audio_tagging_loss=0.008829, over 3045088.29 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:52:49,051 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496900 2023-11-26 08:53:00,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3312713.3333333335, ans=0.0 2023-11-26 08:53:01,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3312713.3333333335, ans=0.1 2023-11-26 08:53:10,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3312780.0, ans=0.125 2023-11-26 08:53:21,509 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3950, loss[loss=0.06754, simple_loss=0.08459, pruned_loss=0.0146, audio_tagging_loss=0.01064, over 15380.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.0902, pruned_loss=0.01252, audio_tagging_loss=0.008895, over 3042028.89 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:53:22,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-11-26 08:53:24,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-26 08:53:30,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3312846.6666666665, ans=0.125 2023-11-26 08:53:31,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3312913.3333333335, ans=0.125 2023-11-26 08:53:39,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3312913.3333333335, ans=0.2 2023-11-26 08:53:41,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3312913.3333333335, ans=0.125 2023-11-26 08:53:44,606 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496950 2023-11-26 08:53:45,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3312980.0, ans=0.125 2023-11-26 08:53:50,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3312980.0, ans=0.125 2023-11-26 08:54:03,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3313046.6666666665, ans=0.0 2023-11-26 08:54:05,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.800e+01 8.614e+01 9.558e+01 1.040e+02 1.255e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 08:54:08,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3313113.3333333335, ans=0.1 2023-11-26 08:54:08,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3313113.3333333335, ans=0.125 2023-11-26 08:54:09,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3313113.3333333335, ans=0.125 2023-11-26 08:54:16,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3313180.0, ans=0.0 2023-11-26 08:54:17,367 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4000, loss[loss=0.08402, simple_loss=0.1274, pruned_loss=0.0127, audio_tagging_loss=0.007599, over 15666.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09113, pruned_loss=0.01257, audio_tagging_loss=0.008928, over 3040922.06 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:54:41,104 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497000 2023-11-26 08:54:43,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3313313.3333333335, ans=0.1 2023-11-26 08:54:46,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3313313.3333333335, ans=0.125 2023-11-26 08:54:46,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3313313.3333333335, ans=0.0 2023-11-26 08:55:12,566 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2023-11-26 08:55:13,073 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4050, loss[loss=0.06719, simple_loss=0.09124, pruned_loss=0.01253, audio_tagging_loss=0.009042, over 15166.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09111, pruned_loss=0.01272, audio_tagging_loss=0.009014, over 3032864.46 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:55:14,669 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:55:19,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3313513.3333333335, ans=0.0 2023-11-26 08:55:31,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3313580.0, ans=0.2 2023-11-26 08:55:35,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3313646.6666666665, ans=0.0 2023-11-26 08:55:37,092 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497050 2023-11-26 08:55:41,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2023-11-26 08:55:52,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3313713.3333333335, ans=0.0 2023-11-26 08:55:57,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.799e+01 9.457e+01 1.021e+02 1.367e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 08:56:09,658 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4100, loss[loss=0.08552, simple_loss=0.123, pruned_loss=0.01792, audio_tagging_loss=0.006117, over 15708.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09097, pruned_loss=0.01267, audio_tagging_loss=0.00901, over 3036056.40 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:56:33,323 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497100 2023-11-26 08:56:37,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3313980.0, ans=0.125 2023-11-26 08:56:55,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3314113.3333333335, ans=0.0 2023-11-26 08:57:05,850 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4150, loss[loss=0.06049, simple_loss=0.08575, pruned_loss=0.01141, audio_tagging_loss=0.006208, over 14686.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09139, pruned_loss=0.01269, audio_tagging_loss=0.008798, over 3030707.82 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:57:09,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3314180.0, ans=0.2 2023-11-26 08:57:10,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2023-11-26 08:57:10,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-26 08:57:15,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3314180.0, ans=0.95 2023-11-26 08:57:29,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497150 2023-11-26 08:57:35,271 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:57:43,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3314380.0, ans=0.125 2023-11-26 08:57:45,173 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:57:49,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.533e+01 8.973e+01 9.473e+01 1.014e+02 1.383e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 08:57:56,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3314446.6666666665, ans=0.125 2023-11-26 08:58:00,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3314513.3333333335, ans=0.0 2023-11-26 08:58:01,785 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4200, loss[loss=0.0767, simple_loss=0.1051, pruned_loss=0.01651, audio_tagging_loss=0.007655, over 14925.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09091, pruned_loss=0.0123, audio_tagging_loss=0.008684, over 3041218.26 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:58:02,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3314513.3333333335, ans=0.125 2023-11-26 08:58:06,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3314513.3333333335, ans=0.0 2023-11-26 08:58:09,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3314513.3333333335, ans=0.125 2023-11-26 08:58:09,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3314513.3333333335, ans=0.125 2023-11-26 08:58:25,587 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497200 2023-11-26 08:58:28,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3314646.6666666665, ans=0.1 2023-11-26 08:58:43,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3314713.3333333335, ans=0.125 2023-11-26 08:58:53,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3314780.0, ans=0.0 2023-11-26 08:58:56,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3314780.0, ans=0.125 2023-11-26 08:58:58,271 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4250, loss[loss=0.06004, simple_loss=0.07019, pruned_loss=0.01459, audio_tagging_loss=0.01036, over 15314.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09046, pruned_loss=0.01239, audio_tagging_loss=0.008638, over 3040294.39 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:59:10,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3314913.3333333335, ans=0.0 2023-11-26 08:59:21,169 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497250 2023-11-26 08:59:40,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315046.6666666665, ans=0.1 2023-11-26 08:59:41,841 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.666e+01 9.281e+01 9.909e+01 1.116e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 08:59:45,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3315113.3333333335, ans=0.125 2023-11-26 08:59:54,040 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4300, loss[loss=0.0729, simple_loss=0.1047, pruned_loss=0.01441, audio_tagging_loss=0.006157, over 15426.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09093, pruned_loss=0.01243, audio_tagging_loss=0.008527, over 3039280.21 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:00:02,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3315180.0, ans=0.0 2023-11-26 09:00:17,441 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497300 2023-11-26 09:00:37,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3315446.6666666665, ans=0.0 2023-11-26 09:00:47,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-26 09:00:49,345 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4350, loss[loss=0.06763, simple_loss=0.08714, pruned_loss=0.013, audio_tagging_loss=0.01106, over 15326.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09016, pruned_loss=0.01232, audio_tagging_loss=0.008592, over 3031884.58 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:00:59,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315513.3333333335, ans=0.1 2023-11-26 09:01:02,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315580.0, ans=0.1 2023-11-26 09:01:06,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3315580.0, ans=0.1 2023-11-26 09:01:13,987 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497350 2023-11-26 09:01:17,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3315646.6666666665, ans=0.125 2023-11-26 09:01:28,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2023-11-26 09:01:33,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.733e+01 9.373e+01 9.862e+01 1.351e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 09:01:38,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3315780.0, ans=0.125 2023-11-26 09:01:46,622 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4400, loss[loss=0.06347, simple_loss=0.07944, pruned_loss=0.01403, audio_tagging_loss=0.009726, over 14951.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08919, pruned_loss=0.01222, audio_tagging_loss=0.008718, over 3036581.41 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:01:55,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3315846.6666666665, ans=0.035 2023-11-26 09:01:57,088 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2023-11-26 09:02:09,308 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497400 2023-11-26 09:02:11,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3315980.0, ans=0.2 2023-11-26 09:02:15,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3315980.0, ans=0.025 2023-11-26 09:02:31,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3316113.3333333335, ans=0.125 2023-11-26 09:02:33,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3316113.3333333335, ans=0.0 2023-11-26 09:02:42,682 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4450, loss[loss=0.08122, simple_loss=0.1132, pruned_loss=0.01811, audio_tagging_loss=0.006513, over 15362.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08992, pruned_loss=0.0125, audio_tagging_loss=0.008713, over 3045380.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:02:54,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3316246.6666666665, ans=0.125 2023-11-26 09:02:57,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3316246.6666666665, ans=0.125 2023-11-26 09:03:03,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3316313.3333333335, ans=0.05 2023-11-26 09:03:06,118 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497450 2023-11-26 09:03:20,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3316380.0, ans=0.125 2023-11-26 09:03:26,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.847e+01 8.913e+01 9.793e+01 1.057e+02 1.326e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-26 09:03:30,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3316446.6666666665, ans=0.2 2023-11-26 09:03:32,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3316446.6666666665, ans=0.0 2023-11-26 09:03:37,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-26 09:03:37,882 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4500, loss[loss=0.07196, simple_loss=0.09815, pruned_loss=0.01465, audio_tagging_loss=0.008232, over 16911.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09092, pruned_loss=0.01266, audio_tagging_loss=0.008607, over 3054613.32 frames. ], batch size: 65, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:03:41,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.53 vs. limit=15.0 2023-11-26 09:03:53,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3316580.0, ans=0.0 2023-11-26 09:04:01,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3316646.6666666665, ans=0.2 2023-11-26 09:04:02,425 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497500 2023-11-26 09:04:03,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-26 09:04:19,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3316713.3333333335, ans=0.2 2023-11-26 09:04:29,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3316780.0, ans=0.0 2023-11-26 09:04:34,320 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4550, loss[loss=0.07876, simple_loss=0.1134, pruned_loss=0.0169, audio_tagging_loss=0.005153, over 16051.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09146, pruned_loss=0.01276, audio_tagging_loss=0.008676, over 3057709.52 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:04:58,035 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497550 2023-11-26 09:04:59,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3316980.0, ans=0.125 2023-11-26 09:05:04,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3316980.0, ans=0.125 2023-11-26 09:05:16,128 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:05:19,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.805e+01 9.410e+01 9.881e+01 1.547e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 09:05:20,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3317113.3333333335, ans=10.0 2023-11-26 09:05:30,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3317180.0, ans=0.2 2023-11-26 09:05:31,125 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4600, loss[loss=0.06198, simple_loss=0.08474, pruned_loss=0.009713, audio_tagging_loss=0.0099, over 15394.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09004, pruned_loss=0.01242, audio_tagging_loss=0.008856, over 3056626.39 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:05:31,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3317180.0, ans=0.5 2023-11-26 09:05:53,470 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497600 2023-11-26 09:06:11,140 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:06:18,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3317446.6666666665, ans=0.0 2023-11-26 09:06:20,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3317446.6666666665, ans=0.2 2023-11-26 09:06:22,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3317446.6666666665, ans=10.0 2023-11-26 09:06:27,038 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4650, loss[loss=0.05537, simple_loss=0.07616, pruned_loss=0.00947, audio_tagging_loss=0.007824, over 15469.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09038, pruned_loss=0.01237, audio_tagging_loss=0.008816, over 3057683.66 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:06:40,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3317580.0, ans=0.2 2023-11-26 09:06:47,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3317580.0, ans=0.125 2023-11-26 09:06:51,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497650 2023-11-26 09:06:57,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3317646.6666666665, ans=0.1 2023-11-26 09:07:12,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.826e+01 9.427e+01 1.038e+02 1.331e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 09:07:22,786 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4700, loss[loss=0.05018, simple_loss=0.06811, pruned_loss=0.006682, audio_tagging_loss=0.009441, over 15796.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09061, pruned_loss=0.01248, audio_tagging_loss=0.008983, over 3057880.84 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:07:23,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-11-26 09:07:37,228 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.08 vs. limit=10.0 2023-11-26 09:07:46,142 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497700 2023-11-26 09:07:56,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3318046.6666666665, ans=0.0 2023-11-26 09:07:57,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3318046.6666666665, ans=0.125 2023-11-26 09:08:01,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3318046.6666666665, ans=0.1 2023-11-26 09:08:05,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3318113.3333333335, ans=0.0 2023-11-26 09:08:18,962 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4750, loss[loss=0.09087, simple_loss=0.1272, pruned_loss=0.02111, audio_tagging_loss=0.006166, over 16356.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09149, pruned_loss=0.01271, audio_tagging_loss=0.008944, over 3060442.07 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:08:19,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3318180.0, ans=0.125 2023-11-26 09:08:33,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3318246.6666666665, ans=0.125 2023-11-26 09:08:41,268 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497750 2023-11-26 09:08:46,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3318313.3333333335, ans=0.1 2023-11-26 09:08:50,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3318380.0, ans=0.125 2023-11-26 09:09:04,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.590e+01 9.271e+01 9.941e+01 1.309e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 09:09:10,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=22.5 2023-11-26 09:09:14,027 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4800, loss[loss=0.07095, simple_loss=0.09116, pruned_loss=0.01514, audio_tagging_loss=0.01024, over 15270.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08986, pruned_loss=0.01255, audio_tagging_loss=0.009099, over 3061659.93 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:09:25,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.70 vs. limit=10.0 2023-11-26 09:09:35,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3318646.6666666665, ans=0.0 2023-11-26 09:09:37,496 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497800 2023-11-26 09:09:44,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=12.0 2023-11-26 09:09:46,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3318646.6666666665, ans=0.0 2023-11-26 09:09:58,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3318780.0, ans=0.0 2023-11-26 09:09:58,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3318780.0, ans=0.0 2023-11-26 09:10:03,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3318780.0, ans=0.125 2023-11-26 09:10:10,114 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4850, loss[loss=0.06946, simple_loss=0.08878, pruned_loss=0.01191, audio_tagging_loss=0.01316, over 15871.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08973, pruned_loss=0.01251, audio_tagging_loss=0.009167, over 3056404.49 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:10:11,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3318846.6666666665, ans=0.125 2023-11-26 09:10:15,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2023-11-26 09:10:34,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497850 2023-11-26 09:10:47,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3319046.6666666665, ans=0.125 2023-11-26 09:10:55,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.672e+01 9.359e+01 1.001e+02 1.200e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:11:06,512 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4900, loss[loss=0.06064, simple_loss=0.08046, pruned_loss=0.009069, audio_tagging_loss=0.01134, over 14441.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09025, pruned_loss=0.01252, audio_tagging_loss=0.009179, over 3045483.24 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:11:11,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3319180.0, ans=0.125 2023-11-26 09:11:28,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497900 2023-11-26 09:12:01,620 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4950, loss[loss=0.06502, simple_loss=0.08683, pruned_loss=0.01247, audio_tagging_loss=0.009135, over 15176.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09025, pruned_loss=0.01245, audio_tagging_loss=0.009015, over 3045246.36 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:12:08,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3319513.3333333335, ans=0.0 2023-11-26 09:12:20,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3319580.0, ans=0.125 2023-11-26 09:12:24,935 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497950 2023-11-26 09:12:36,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3319713.3333333335, ans=0.1 2023-11-26 09:12:46,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.822e+01 9.528e+01 1.003e+02 1.233e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 09:12:56,701 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5000, loss[loss=0.07574, simple_loss=0.09988, pruned_loss=0.01775, audio_tagging_loss=0.008043, over 15451.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08956, pruned_loss=0.01239, audio_tagging_loss=0.008897, over 3038964.25 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:12:59,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3319846.6666666665, ans=0.0 2023-11-26 09:13:04,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3319846.6666666665, ans=0.2 2023-11-26 09:13:20,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3319980.0, ans=0.2 2023-11-26 09:13:21,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498000 2023-11-26 09:13:53,747 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5050, loss[loss=0.0769, simple_loss=0.1086, pruned_loss=0.01478, audio_tagging_loss=0.007831, over 15006.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08904, pruned_loss=0.01238, audio_tagging_loss=0.008908, over 3041108.55 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:13:58,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3320180.0, ans=0.125 2023-11-26 09:14:06,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3320246.6666666665, ans=0.125 2023-11-26 09:14:09,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3320246.6666666665, ans=0.125 2023-11-26 09:14:17,105 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498050 2023-11-26 09:14:25,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3320313.3333333335, ans=0.0 2023-11-26 09:14:25,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3320313.3333333335, ans=0.0 2023-11-26 09:14:26,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3320380.0, ans=0.5 2023-11-26 09:14:39,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.596e+01 9.338e+01 9.985e+01 1.178e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 09:14:48,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3320446.6666666665, ans=0.1 2023-11-26 09:14:49,960 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5100, loss[loss=0.04873, simple_loss=0.05679, pruned_loss=0.01037, audio_tagging_loss=0.00997, over 13321.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0883, pruned_loss=0.01239, audio_tagging_loss=0.00887, over 3042689.66 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:14:56,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3320513.3333333335, ans=0.125 2023-11-26 09:15:00,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3320580.0, ans=0.1 2023-11-26 09:15:03,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3320580.0, ans=0.2 2023-11-26 09:15:09,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3320580.0, ans=0.0 2023-11-26 09:15:12,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498100 2023-11-26 09:15:45,335 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5150, loss[loss=0.05553, simple_loss=0.0752, pruned_loss=0.00855, audio_tagging_loss=0.009376, over 14880.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08888, pruned_loss=0.01229, audio_tagging_loss=0.008808, over 3038879.43 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:16:09,424 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498150 2023-11-26 09:16:24,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3321046.6666666665, ans=0.0 2023-11-26 09:16:25,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3321046.6666666665, ans=0.0 2023-11-26 09:16:26,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3321046.6666666665, ans=0.125 2023-11-26 09:16:27,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2023-11-26 09:16:31,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.833e+01 9.269e+01 1.033e+02 1.245e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 09:16:41,830 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5200, loss[loss=0.08154, simple_loss=0.1074, pruned_loss=0.01849, audio_tagging_loss=0.009346, over 14260.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09001, pruned_loss=0.0125, audio_tagging_loss=0.008702, over 3035683.09 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:17:05,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498200 2023-11-26 09:17:06,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3321313.3333333335, ans=0.125 2023-11-26 09:17:19,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3321380.0, ans=0.2 2023-11-26 09:17:25,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3321446.6666666665, ans=0.125 2023-11-26 09:17:32,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3321446.6666666665, ans=0.05 2023-11-26 09:17:35,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321446.6666666665, ans=0.1 2023-11-26 09:17:37,759 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5250, loss[loss=0.0546, simple_loss=0.0738, pruned_loss=0.01016, audio_tagging_loss=0.007538, over 14890.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09073, pruned_loss=0.01275, audio_tagging_loss=0.008582, over 3036951.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:18:01,139 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498250 2023-11-26 09:18:09,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-26 09:18:14,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3321713.3333333335, ans=0.0 2023-11-26 09:18:17,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3321713.3333333335, ans=0.125 2023-11-26 09:18:18,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3321713.3333333335, ans=0.1 2023-11-26 09:18:21,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3321780.0, ans=0.0 2023-11-26 09:18:23,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.788e+01 9.359e+01 1.015e+02 1.795e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:18:26,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3321780.0, ans=0.0 2023-11-26 09:18:33,628 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5300, loss[loss=0.06329, simple_loss=0.08463, pruned_loss=0.009774, audio_tagging_loss=0.0112, over 14511.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09136, pruned_loss=0.0128, audio_tagging_loss=0.008611, over 3036867.80 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:18:56,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3321980.0, ans=0.125 2023-11-26 09:18:57,461 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498300 2023-11-26 09:18:58,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3321980.0, ans=0.0 2023-11-26 09:19:05,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3321980.0, ans=0.2 2023-11-26 09:19:08,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3322046.6666666665, ans=0.125 2023-11-26 09:19:17,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3322113.3333333335, ans=0.07 2023-11-26 09:19:18,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3322113.3333333335, ans=0.125 2023-11-26 09:19:29,806 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5350, loss[loss=0.06371, simple_loss=0.08998, pruned_loss=0.009974, audio_tagging_loss=0.008751, over 15565.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09218, pruned_loss=0.01278, audio_tagging_loss=0.008593, over 3041504.93 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:19:41,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3322246.6666666665, ans=0.2 2023-11-26 09:19:42,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3322246.6666666665, ans=0.2 2023-11-26 09:19:43,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3322246.6666666665, ans=0.125 2023-11-26 09:19:47,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3322246.6666666665, ans=0.125 2023-11-26 09:19:52,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498350 2023-11-26 09:20:02,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3322380.0, ans=0.1 2023-11-26 09:20:04,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3322380.0, ans=0.0 2023-11-26 09:20:10,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3322380.0, ans=0.0 2023-11-26 09:20:15,711 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:20:16,465 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.636e+01 9.507e+01 1.021e+02 1.196e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 09:20:17,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3322446.6666666665, ans=0.2 2023-11-26 09:20:20,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-26 09:20:25,609 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5400, loss[loss=0.07351, simple_loss=0.1006, pruned_loss=0.01507, audio_tagging_loss=0.00814, over 15434.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08999, pruned_loss=0.01237, audio_tagging_loss=0.008681, over 3045895.29 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:20:32,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3322513.3333333335, ans=0.1 2023-11-26 09:20:48,050 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:20:48,996 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498400 2023-11-26 09:21:03,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3322713.3333333335, ans=0.07 2023-11-26 09:21:07,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3322713.3333333335, ans=0.07 2023-11-26 09:21:20,961 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5450, loss[loss=0.07361, simple_loss=0.09286, pruned_loss=0.01775, audio_tagging_loss=0.009429, over 14643.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09092, pruned_loss=0.01265, audio_tagging_loss=0.008607, over 3040615.66 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:21:25,526 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=12.0 2023-11-26 09:21:37,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3322913.3333333335, ans=0.0 2023-11-26 09:21:44,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3322980.0, ans=0.5 2023-11-26 09:21:45,559 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498450 2023-11-26 09:21:56,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3323046.6666666665, ans=0.125 2023-11-26 09:22:08,291 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.615e+01 9.390e+01 1.017e+02 1.312e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 09:22:08,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3323113.3333333335, ans=0.125 2023-11-26 09:22:17,204 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5500, loss[loss=0.05257, simple_loss=0.07151, pruned_loss=0.007031, audio_tagging_loss=0.009786, over 16679.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09071, pruned_loss=0.01257, audio_tagging_loss=0.008629, over 3041333.27 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:22:18,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2023-11-26 09:22:23,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-26 09:22:39,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2023-11-26 09:22:40,714 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498500 2023-11-26 09:22:51,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3323380.0, ans=0.1 2023-11-26 09:23:13,562 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5550, loss[loss=0.07578, simple_loss=0.09493, pruned_loss=0.01787, audio_tagging_loss=0.01043, over 15166.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.0903, pruned_loss=0.01252, audio_tagging_loss=0.008699, over 3039351.60 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:23:17,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3323513.3333333335, ans=0.0 2023-11-26 09:23:18,018 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:23:22,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.77 vs. limit=15.0 2023-11-26 09:23:26,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2023-11-26 09:23:36,483 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498550 2023-11-26 09:23:50,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3323713.3333333335, ans=0.0 2023-11-26 09:23:54,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3323713.3333333335, ans=0.125 2023-11-26 09:23:55,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3323713.3333333335, ans=0.1 2023-11-26 09:23:56,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3323713.3333333335, ans=0.0 2023-11-26 09:24:00,245 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 8.665e+01 9.160e+01 9.875e+01 1.167e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 09:24:06,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3323780.0, ans=0.035 2023-11-26 09:24:08,707 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5600, loss[loss=0.07453, simple_loss=0.1002, pruned_loss=0.01614, audio_tagging_loss=0.008313, over 14825.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09063, pruned_loss=0.01257, audio_tagging_loss=0.008861, over 3039442.82 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:24:17,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3323846.6666666665, ans=0.125 2023-11-26 09:24:24,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3323913.3333333335, ans=0.2 2023-11-26 09:24:32,143 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498600 2023-11-26 09:24:47,737 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:24:52,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3324113.3333333335, ans=0.0 2023-11-26 09:24:53,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3324113.3333333335, ans=0.2 2023-11-26 09:25:02,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-11-26 09:25:04,647 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5650, loss[loss=0.06189, simple_loss=0.08361, pruned_loss=0.01063, audio_tagging_loss=0.009448, over 15487.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08994, pruned_loss=0.01251, audio_tagging_loss=0.008969, over 3040860.55 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:25:27,973 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498650 2023-11-26 09:25:28,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3324313.3333333335, ans=0.0 2023-11-26 09:25:35,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3324313.3333333335, ans=0.0 2023-11-26 09:25:51,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.711e+01 9.394e+01 1.016e+02 1.261e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 09:25:55,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3324446.6666666665, ans=0.125 2023-11-26 09:26:00,614 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5700, loss[loss=0.05738, simple_loss=0.08158, pruned_loss=0.009641, audio_tagging_loss=0.006948, over 15674.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09051, pruned_loss=0.0127, audio_tagging_loss=0.008896, over 3048347.59 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:26:22,927 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498700 2023-11-26 09:26:26,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3324646.6666666665, ans=0.125 2023-11-26 09:26:27,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3324646.6666666665, ans=0.0 2023-11-26 09:26:44,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-11-26 09:26:51,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=22.5 2023-11-26 09:26:55,475 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5750, loss[loss=0.08535, simple_loss=0.11, pruned_loss=0.01845, audio_tagging_loss=0.01191, over 15548.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09083, pruned_loss=0.01286, audio_tagging_loss=0.008723, over 3049276.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:27:04,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3324846.6666666665, ans=0.125 2023-11-26 09:27:10,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3324913.3333333335, ans=0.125 2023-11-26 09:27:12,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3324913.3333333335, ans=0.125 2023-11-26 09:27:19,394 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498750 2023-11-26 09:27:28,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3325046.6666666665, ans=0.125 2023-11-26 09:27:29,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3325046.6666666665, ans=0.125 2023-11-26 09:27:43,274 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.688e+01 9.672e+01 1.037e+02 1.412e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 09:27:51,155 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5800, loss[loss=0.06108, simple_loss=0.07953, pruned_loss=0.01294, audio_tagging_loss=0.008369, over 14703.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09015, pruned_loss=0.01299, audio_tagging_loss=0.008723, over 3043320.88 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:28:15,011 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498800 2023-11-26 09:28:21,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3325313.3333333335, ans=0.125 2023-11-26 09:28:46,791 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5850, loss[loss=0.05983, simple_loss=0.08203, pruned_loss=0.01096, audio_tagging_loss=0.007855, over 14877.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09023, pruned_loss=0.01277, audio_tagging_loss=0.008717, over 3047716.67 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:28:46,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3325513.3333333335, ans=0.0 2023-11-26 09:28:56,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3325513.3333333335, ans=0.1 2023-11-26 09:29:07,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3325646.6666666665, ans=0.09899494936611666 2023-11-26 09:29:09,696 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498850 2023-11-26 09:29:35,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.712e+01 9.460e+01 1.009e+02 5.552e+02, threshold=1.892e+02, percent-clipped=1.0 2023-11-26 09:29:42,453 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5900, loss[loss=0.05539, simple_loss=0.08224, pruned_loss=0.00631, audio_tagging_loss=0.007959, over 16228.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.0908, pruned_loss=0.01278, audio_tagging_loss=0.008678, over 3054770.22 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:29:45,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-11-26 09:29:48,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3325846.6666666665, ans=0.125 2023-11-26 09:29:53,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3325913.3333333335, ans=0.125 2023-11-26 09:30:05,950 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498900 2023-11-26 09:30:16,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3326046.6666666665, ans=0.0 2023-11-26 09:30:17,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3326046.6666666665, ans=0.125 2023-11-26 09:30:25,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-26 09:30:29,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-26 09:30:31,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2023-11-26 09:30:33,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3326113.3333333335, ans=0.2 2023-11-26 09:30:37,534 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5950, loss[loss=0.06268, simple_loss=0.09078, pruned_loss=0.009411, audio_tagging_loss=0.007881, over 14845.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08968, pruned_loss=0.01256, audio_tagging_loss=0.008676, over 3059013.44 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:30:40,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3326180.0, ans=0.09899494936611666 2023-11-26 09:30:42,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3326180.0, ans=0.125 2023-11-26 09:30:43,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3326180.0, ans=0.125 2023-11-26 09:30:50,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3326246.6666666665, ans=0.125 2023-11-26 09:31:01,968 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498950 2023-11-26 09:31:18,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3326380.0, ans=0.0 2023-11-26 09:31:25,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.795e+01 9.324e+01 1.011e+02 1.404e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 09:31:33,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3326513.3333333335, ans=0.125 2023-11-26 09:31:33,847 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6000, loss[loss=0.04966, simple_loss=0.05849, pruned_loss=0.008748, audio_tagging_loss=0.01167, over 14889.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08931, pruned_loss=0.01245, audio_tagging_loss=0.008627, over 3058097.69 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:31:33,849 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 09:32:03,087 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3459, 5.0378, 4.6611, 5.1751], device='cuda:2') 2023-11-26 09:32:06,559 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05807, simple_loss=0.05064, pruned_loss=0.005286, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-26 09:32:06,560 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 09:32:08,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3326513.3333333335, ans=0.125 2023-11-26 09:32:10,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-11-26 09:32:14,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3326513.3333333335, ans=0.1 2023-11-26 09:32:29,927 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499000 2023-11-26 09:32:45,648 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:32:50,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3326780.0, ans=0.125 2023-11-26 09:33:01,923 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6050, loss[loss=0.05548, simple_loss=0.07355, pruned_loss=0.009683, audio_tagging_loss=0.009017, over 15912.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08925, pruned_loss=0.01246, audio_tagging_loss=0.008707, over 3063082.40 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:33:07,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2023-11-26 09:33:24,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3326980.0, ans=0.125 2023-11-26 09:33:25,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499050 2023-11-26 09:33:34,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2023-11-26 09:33:36,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3327046.6666666665, ans=0.125 2023-11-26 09:33:39,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2023-11-26 09:33:46,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3327113.3333333335, ans=0.125 2023-11-26 09:33:47,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3327113.3333333335, ans=0.0 2023-11-26 09:33:49,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.694e+01 9.426e+01 1.019e+02 1.507e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 09:33:54,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3327113.3333333335, ans=0.1 2023-11-26 09:33:58,293 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6100, loss[loss=0.07364, simple_loss=0.1065, pruned_loss=0.0139, audio_tagging_loss=0.006505, over 15116.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08948, pruned_loss=0.01247, audio_tagging_loss=0.008671, over 3063279.64 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:34:02,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3327180.0, ans=0.125 2023-11-26 09:34:12,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3327246.6666666665, ans=0.125 2023-11-26 09:34:14,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3327246.6666666665, ans=0.125 2023-11-26 09:34:21,294 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499100 2023-11-26 09:34:38,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3327380.0, ans=0.0 2023-11-26 09:34:41,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3327380.0, ans=0.125 2023-11-26 09:34:54,337 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6150, loss[loss=0.06149, simple_loss=0.09136, pruned_loss=0.009173, audio_tagging_loss=0.006633, over 15103.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08892, pruned_loss=0.01212, audio_tagging_loss=0.008774, over 3058428.36 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:35:17,628 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499150 2023-11-26 09:35:42,930 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.630e+01 9.202e+01 1.002e+02 1.257e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 09:35:49,834 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6200, loss[loss=0.05588, simple_loss=0.06864, pruned_loss=0.009351, audio_tagging_loss=0.01222, over 14737.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08956, pruned_loss=0.01236, audio_tagging_loss=0.008787, over 3055764.71 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:36:06,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3327913.3333333335, ans=0.0 2023-11-26 09:36:08,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3327913.3333333335, ans=0.07 2023-11-26 09:36:08,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3327913.3333333335, ans=0.125 2023-11-26 09:36:13,302 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499200 2023-11-26 09:36:17,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-26 09:36:19,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3327980.0, ans=0.0 2023-11-26 09:36:23,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3328046.6666666665, ans=0.2 2023-11-26 09:36:46,678 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6250, loss[loss=0.06368, simple_loss=0.08703, pruned_loss=0.01034, audio_tagging_loss=0.009826, over 14304.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09012, pruned_loss=0.01244, audio_tagging_loss=0.008861, over 3058601.84 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:36:46,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3328180.0, ans=0.5 2023-11-26 09:37:09,548 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499250 2023-11-26 09:37:10,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3328313.3333333335, ans=0.2 2023-11-26 09:37:25,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-26 09:37:25,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3328380.0, ans=0.125 2023-11-26 09:37:36,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2023-11-26 09:37:36,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.808e+01 9.356e+01 1.010e+02 1.714e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 09:37:38,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3328446.6666666665, ans=0.2 2023-11-26 09:37:42,471 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6300, loss[loss=0.06506, simple_loss=0.09378, pruned_loss=0.009557, audio_tagging_loss=0.008611, over 16296.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08957, pruned_loss=0.01236, audio_tagging_loss=0.008964, over 3059698.44 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:37:42,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3328513.3333333335, ans=0.0 2023-11-26 09:37:56,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3328580.0, ans=0.125 2023-11-26 09:38:03,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3328646.6666666665, ans=0.2 2023-11-26 09:38:05,825 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499300 2023-11-26 09:38:07,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3328646.6666666665, ans=0.07 2023-11-26 09:38:14,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-26 09:38:33,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328780.0, ans=0.1 2023-11-26 09:38:34,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3328780.0, ans=0.125 2023-11-26 09:38:37,436 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6350, loss[loss=0.05848, simple_loss=0.08894, pruned_loss=0.007623, audio_tagging_loss=0.006384, over 14407.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09001, pruned_loss=0.0126, audio_tagging_loss=0.008997, over 3053389.02 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:38:37,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3328846.6666666665, ans=0.0 2023-11-26 09:39:00,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3328980.0, ans=0.0 2023-11-26 09:39:01,258 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499350 2023-11-26 09:39:01,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3328980.0, ans=0.125 2023-11-26 09:39:05,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3328980.0, ans=0.0 2023-11-26 09:39:09,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3328980.0, ans=0.0 2023-11-26 09:39:19,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-11-26 09:39:20,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2023-11-26 09:39:23,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3329113.3333333335, ans=0.1 2023-11-26 09:39:27,765 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 8.815e+01 9.462e+01 1.012e+02 1.581e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 09:39:33,588 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6400, loss[loss=0.04868, simple_loss=0.06196, pruned_loss=0.005403, audio_tagging_loss=0.0123, over 13828.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08944, pruned_loss=0.01259, audio_tagging_loss=0.009069, over 3046051.94 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:39:43,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3329246.6666666665, ans=0.125 2023-11-26 09:39:56,836 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499400 2023-11-26 09:40:06,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3329380.0, ans=0.0 2023-11-26 09:40:07,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3329380.0, ans=0.1 2023-11-26 09:40:15,203 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:40:23,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3329446.6666666665, ans=0.1 2023-11-26 09:40:23,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3329446.6666666665, ans=0.5 2023-11-26 09:40:29,270 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6450, loss[loss=0.08017, simple_loss=0.1141, pruned_loss=0.0165, audio_tagging_loss=0.006595, over 16189.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08973, pruned_loss=0.01253, audio_tagging_loss=0.009023, over 3047245.74 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:40:29,772 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2023-11-26 09:40:31,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.28 vs. limit=22.5 2023-11-26 09:40:35,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2023-11-26 09:40:47,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3329580.0, ans=0.125 2023-11-26 09:40:50,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3329646.6666666665, ans=0.125 2023-11-26 09:40:52,733 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499450 2023-11-26 09:40:55,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3329646.6666666665, ans=0.0 2023-11-26 09:41:12,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3329780.0, ans=0.125 2023-11-26 09:41:17,683 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.75 vs. limit=10.0 2023-11-26 09:41:19,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.730e+01 9.364e+01 9.937e+01 1.364e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 09:41:25,100 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6500, loss[loss=0.07665, simple_loss=0.0986, pruned_loss=0.01798, audio_tagging_loss=0.009372, over 14718.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08999, pruned_loss=0.01243, audio_tagging_loss=0.008892, over 3041986.29 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:41:48,817 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499500 2023-11-26 09:41:57,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3330046.6666666665, ans=0.0 2023-11-26 09:42:16,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3330113.3333333335, ans=0.0 2023-11-26 09:42:16,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3330113.3333333335, ans=0.0 2023-11-26 09:42:20,847 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6550, loss[loss=0.07786, simple_loss=0.1147, pruned_loss=0.01529, audio_tagging_loss=0.005199, over 15853.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09074, pruned_loss=0.0125, audio_tagging_loss=0.008739, over 3048461.27 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:42:31,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3330246.6666666665, ans=0.125 2023-11-26 09:42:41,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3330313.3333333335, ans=0.2 2023-11-26 09:42:43,318 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-26 09:42:43,932 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499550 2023-11-26 09:42:49,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3330313.3333333335, ans=0.125 2023-11-26 09:42:57,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3330380.0, ans=0.125 2023-11-26 09:43:04,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3330446.6666666665, ans=0.0 2023-11-26 09:43:07,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.77 vs. limit=10.0 2023-11-26 09:43:07,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-11-26 09:43:11,435 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.534e+01 9.204e+01 9.909e+01 1.481e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 09:43:16,669 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6600, loss[loss=0.05957, simple_loss=0.0884, pruned_loss=0.008629, audio_tagging_loss=0.006739, over 15426.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09037, pruned_loss=0.01246, audio_tagging_loss=0.008662, over 3048920.60 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:43:16,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3330513.3333333335, ans=0.2 2023-11-26 09:43:21,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3330513.3333333335, ans=0.2 2023-11-26 09:43:24,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3330513.3333333335, ans=0.0 2023-11-26 09:43:35,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3330580.0, ans=0.125 2023-11-26 09:43:37,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3330646.6666666665, ans=0.0 2023-11-26 09:43:39,875 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499600 2023-11-26 09:43:48,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3330646.6666666665, ans=10.0 2023-11-26 09:43:48,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3330646.6666666665, ans=0.0 2023-11-26 09:43:56,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2023-11-26 09:44:04,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3330780.0, ans=0.95 2023-11-26 09:44:09,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3330780.0, ans=0.125 2023-11-26 09:44:11,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3330846.6666666665, ans=0.125 2023-11-26 09:44:11,838 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6650, loss[loss=0.06068, simple_loss=0.08739, pruned_loss=0.009712, audio_tagging_loss=0.007277, over 14171.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.0902, pruned_loss=0.01253, audio_tagging_loss=0.00863, over 3044567.48 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:44:12,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3330846.6666666665, ans=0.0 2023-11-26 09:44:15,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-26 09:44:20,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3330846.6666666665, ans=0.0 2023-11-26 09:44:36,716 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499650 2023-11-26 09:44:39,122 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:44:42,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3330980.0, ans=0.125 2023-11-26 09:44:48,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-11-26 09:45:02,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-26 09:45:02,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.692e+01 9.274e+01 1.004e+02 1.194e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 09:45:07,946 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6700, loss[loss=0.07081, simple_loss=0.1009, pruned_loss=0.0132, audio_tagging_loss=0.00716, over 15628.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09059, pruned_loss=0.01257, audio_tagging_loss=0.008575, over 3038032.11 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:45:10,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3331180.0, ans=0.0 2023-11-26 09:45:22,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3331246.6666666665, ans=0.1 2023-11-26 09:45:30,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3331313.3333333335, ans=0.125 2023-11-26 09:45:31,205 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499700 2023-11-26 09:45:42,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3331380.0, ans=15.0 2023-11-26 09:46:00,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2023-11-26 09:46:04,065 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6750, loss[loss=0.06586, simple_loss=0.08946, pruned_loss=0.0112, audio_tagging_loss=0.009929, over 14841.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09026, pruned_loss=0.0124, audio_tagging_loss=0.008586, over 3039925.46 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:46:05,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-26 09:46:10,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3331513.3333333335, ans=0.125 2023-11-26 09:46:26,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499750 2023-11-26 09:46:32,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3331646.6666666665, ans=0.125 2023-11-26 09:46:37,311 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2023-11-26 09:46:46,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3331713.3333333335, ans=0.0 2023-11-26 09:46:50,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2023-11-26 09:46:52,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3331780.0, ans=0.125 2023-11-26 09:46:53,938 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.795e+01 9.376e+01 1.027e+02 1.751e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 09:46:54,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3331780.0, ans=0.125 2023-11-26 09:46:59,205 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6800, loss[loss=0.05139, simple_loss=0.06124, pruned_loss=0.008321, audio_tagging_loss=0.01245, over 15730.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09002, pruned_loss=0.01242, audio_tagging_loss=0.008687, over 3041425.63 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:47:00,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3331846.6666666665, ans=0.125 2023-11-26 09:47:01,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3331846.6666666665, ans=0.125 2023-11-26 09:47:18,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=12.0 2023-11-26 09:47:23,607 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499800 2023-11-26 09:47:24,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3331980.0, ans=0.1 2023-11-26 09:47:55,092 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6850, loss[loss=0.05908, simple_loss=0.0826, pruned_loss=0.009007, audio_tagging_loss=0.008776, over 15200.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08949, pruned_loss=0.01223, audio_tagging_loss=0.008693, over 3029834.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:47:57,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3332180.0, ans=0.125 2023-11-26 09:48:13,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3332246.6666666665, ans=0.125 2023-11-26 09:48:18,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499850 2023-11-26 09:48:23,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3332313.3333333335, ans=0.1 2023-11-26 09:48:25,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3332313.3333333335, ans=0.2 2023-11-26 09:48:36,750 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2023-11-26 09:48:43,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3332446.6666666665, ans=0.0 2023-11-26 09:48:45,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.735e+01 9.366e+01 1.004e+02 1.286e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 09:48:45,502 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:48:50,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3332513.3333333335, ans=0.0 2023-11-26 09:48:50,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3332513.3333333335, ans=0.1 2023-11-26 09:48:51,078 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6900, loss[loss=0.07482, simple_loss=0.1089, pruned_loss=0.01304, audio_tagging_loss=0.007348, over 15996.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.0896, pruned_loss=0.01224, audio_tagging_loss=0.008699, over 3039775.02 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:48:55,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2023-11-26 09:48:59,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-26 09:49:09,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3332580.0, ans=0.07 2023-11-26 09:49:13,450 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499900 2023-11-26 09:49:14,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3332646.6666666665, ans=0.0 2023-11-26 09:49:33,062 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:49:36,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3332780.0, ans=0.125 2023-11-26 09:49:40,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-26 09:49:41,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3332780.0, ans=0.125 2023-11-26 09:49:45,788 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6950, loss[loss=0.06249, simple_loss=0.09069, pruned_loss=0.01164, audio_tagging_loss=0.005502, over 14308.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08941, pruned_loss=0.01221, audio_tagging_loss=0.008739, over 3043364.59 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:49:49,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3332846.6666666665, ans=0.125 2023-11-26 09:49:50,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3332846.6666666665, ans=0.125 2023-11-26 09:50:05,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3332913.3333333335, ans=0.0 2023-11-26 09:50:09,628 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499950 2023-11-26 09:50:22,351 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-26 09:50:24,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3333046.6666666665, ans=0.07 2023-11-26 09:50:35,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.609e+01 9.217e+01 1.000e+02 1.228e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 09:50:40,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333180.0, ans=0.1 2023-11-26 09:50:41,407 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7000, loss[loss=0.0708, simple_loss=0.09244, pruned_loss=0.01754, audio_tagging_loss=0.007041, over 14442.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08941, pruned_loss=0.01239, audio_tagging_loss=0.00877, over 3045681.30 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:50:41,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-26 09:50:44,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3333180.0, ans=0.1 2023-11-26 09:51:05,437 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500000 2023-11-26 09:51:06,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3333313.3333333335, ans=0.125 2023-11-26 09:51:16,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3333380.0, ans=0.125 2023-11-26 09:51:20,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3333380.0, ans=0.0 2023-11-26 09:51:22,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333380.0, ans=0.1 2023-11-26 09:51:28,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3333446.6666666665, ans=0.125 2023-11-26 09:51:28,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3333446.6666666665, ans=0.125 2023-11-26 09:51:35,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3333446.6666666665, ans=0.05 2023-11-26 09:51:36,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3333446.6666666665, ans=0.1 2023-11-26 09:51:40,273 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7050, loss[loss=0.04412, simple_loss=0.06337, pruned_loss=0.003792, audio_tagging_loss=0.008641, over 15427.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09027, pruned_loss=0.01241, audio_tagging_loss=0.008705, over 3047704.66 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:51:59,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3333580.0, ans=0.04949747468305833 2023-11-26 09:51:59,881 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-26 09:52:02,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500050 2023-11-26 09:52:02,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3333646.6666666665, ans=0.125 2023-11-26 09:52:09,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3333646.6666666665, ans=0.125 2023-11-26 09:52:19,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3333713.3333333335, ans=0.125 2023-11-26 09:52:29,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3333780.0, ans=0.0 2023-11-26 09:52:31,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.700e+01 9.418e+01 1.001e+02 1.210e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 09:52:35,493 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7100, loss[loss=0.05635, simple_loss=0.07735, pruned_loss=0.008644, audio_tagging_loss=0.009035, over 14635.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08907, pruned_loss=0.01224, audio_tagging_loss=0.008796, over 3048919.98 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:52:52,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3333913.3333333335, ans=0.0 2023-11-26 09:52:55,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3333913.3333333335, ans=0.125 2023-11-26 09:52:58,308 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500100 2023-11-26 09:53:02,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3333980.0, ans=0.05 2023-11-26 09:53:11,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=22.5 2023-11-26 09:53:19,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-26 09:53:26,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-26 09:53:30,079 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7150, loss[loss=0.04828, simple_loss=0.05716, pruned_loss=0.008389, audio_tagging_loss=0.01131, over 14442.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08901, pruned_loss=0.01219, audio_tagging_loss=0.00883, over 3043268.42 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:53:46,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=22.5 2023-11-26 09:53:54,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500150 2023-11-26 09:54:21,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.683e+01 9.180e+01 1.005e+02 1.351e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 09:54:24,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-26 09:54:26,311 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7200, loss[loss=0.0699, simple_loss=0.09199, pruned_loss=0.01445, audio_tagging_loss=0.00945, over 15175.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08981, pruned_loss=0.0124, audio_tagging_loss=0.008863, over 3042725.52 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:54:31,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3334513.3333333335, ans=0.0 2023-11-26 09:54:31,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3334513.3333333335, ans=0.125 2023-11-26 09:54:34,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=15.0 2023-11-26 09:54:41,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3334580.0, ans=0.2 2023-11-26 09:54:49,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=22.5 2023-11-26 09:54:49,681 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500200 2023-11-26 09:55:08,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-26 09:55:08,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3334713.3333333335, ans=0.125 2023-11-26 09:55:11,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-11-26 09:55:21,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3334846.6666666665, ans=0.125 2023-11-26 09:55:22,856 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7250, loss[loss=0.07513, simple_loss=0.1036, pruned_loss=0.01472, audio_tagging_loss=0.008629, over 15618.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08978, pruned_loss=0.01227, audio_tagging_loss=0.008914, over 3038911.45 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:55:25,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3334846.6666666665, ans=0.1 2023-11-26 09:55:30,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3334846.6666666665, ans=0.125 2023-11-26 09:55:34,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-11-26 09:55:45,985 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500250 2023-11-26 09:56:12,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3335113.3333333335, ans=0.2 2023-11-26 09:56:15,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.852e+01 9.361e+01 1.013e+02 1.372e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:56:15,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3335113.3333333335, ans=0.125 2023-11-26 09:56:18,397 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7300, loss[loss=0.06074, simple_loss=0.08503, pruned_loss=0.01124, audio_tagging_loss=0.006983, over 14605.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08905, pruned_loss=0.01215, audio_tagging_loss=0.008966, over 3035764.38 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:56:32,449 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:56:40,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-26 09:56:41,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3335313.3333333335, ans=0.05 2023-11-26 09:56:42,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500300 2023-11-26 09:56:49,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3335313.3333333335, ans=0.125 2023-11-26 09:56:51,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3335380.0, ans=0.125 2023-11-26 09:56:53,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3335380.0, ans=0.0 2023-11-26 09:56:58,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3335380.0, ans=0.125 2023-11-26 09:56:58,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3335380.0, ans=0.0 2023-11-26 09:57:02,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3335446.6666666665, ans=0.125 2023-11-26 09:57:07,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3335446.6666666665, ans=0.125 2023-11-26 09:57:14,265 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7350, loss[loss=0.07275, simple_loss=0.1008, pruned_loss=0.01417, audio_tagging_loss=0.008155, over 14985.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08899, pruned_loss=0.01217, audio_tagging_loss=0.008866, over 3036066.26 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:57:15,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3335513.3333333335, ans=0.125 2023-11-26 09:57:16,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335513.3333333335, ans=0.1 2023-11-26 09:57:37,773 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500350 2023-11-26 09:57:49,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3335713.3333333335, ans=0.0 2023-11-26 09:57:54,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3335713.3333333335, ans=0.2 2023-11-26 09:58:00,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3335780.0, ans=0.2 2023-11-26 09:58:02,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3335780.0, ans=0.125 2023-11-26 09:58:06,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.616e+01 9.240e+01 1.003e+02 1.313e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 09:58:09,959 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7400, loss[loss=0.06256, simple_loss=0.07646, pruned_loss=0.01358, audio_tagging_loss=0.01076, over 14702.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0895, pruned_loss=0.0122, audio_tagging_loss=0.008691, over 3037386.09 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:58:15,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3335846.6666666665, ans=0.125 2023-11-26 09:58:26,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3335913.3333333335, ans=0.05 2023-11-26 09:58:27,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=12.0 2023-11-26 09:58:33,288 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500400 2023-11-26 09:59:05,894 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7450, loss[loss=0.06969, simple_loss=0.08657, pruned_loss=0.01727, audio_tagging_loss=0.009133, over 13677.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0897, pruned_loss=0.01227, audio_tagging_loss=0.008624, over 3033278.41 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:59:07,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3336180.0, ans=0.125 2023-11-26 09:59:14,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-26 09:59:29,771 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500450 2023-11-26 09:59:58,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.877e+01 9.404e+01 1.015e+02 1.370e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 10:00:01,514 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7500, loss[loss=0.06134, simple_loss=0.0882, pruned_loss=0.008393, audio_tagging_loss=0.008849, over 14998.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08998, pruned_loss=0.01214, audio_tagging_loss=0.00864, over 3033772.18 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:00:02,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3336513.3333333335, ans=0.125 2023-11-26 10:00:14,341 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:00:17,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2023-11-26 10:00:25,307 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500500 2023-11-26 10:00:37,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3336713.3333333335, ans=0.2 2023-11-26 10:00:41,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3336713.3333333335, ans=0.125 2023-11-26 10:00:57,572 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7550, loss[loss=0.05595, simple_loss=0.07635, pruned_loss=0.0097, audio_tagging_loss=0.008073, over 14741.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08992, pruned_loss=0.01212, audio_tagging_loss=0.008608, over 3036263.90 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:00:59,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3336846.6666666665, ans=0.0 2023-11-26 10:01:01,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3336846.6666666665, ans=0.1 2023-11-26 10:01:04,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3336846.6666666665, ans=0.0 2023-11-26 10:01:05,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3336846.6666666665, ans=0.125 2023-11-26 10:01:20,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3336980.0, ans=0.125 2023-11-26 10:01:21,051 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500550 2023-11-26 10:01:22,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3336980.0, ans=0.1 2023-11-26 10:01:25,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3336980.0, ans=0.0 2023-11-26 10:01:39,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3337046.6666666665, ans=0.95 2023-11-26 10:01:49,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.478e+01 8.999e+01 9.554e+01 1.187e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-26 10:01:51,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3337113.3333333335, ans=22.5 2023-11-26 10:01:53,387 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7600, loss[loss=0.05601, simple_loss=0.07543, pruned_loss=0.008271, audio_tagging_loss=0.01003, over 15097.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08874, pruned_loss=0.01201, audio_tagging_loss=0.008658, over 3036576.05 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:01:55,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3337180.0, ans=0.1 2023-11-26 10:01:57,713 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:02:17,235 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500600 2023-11-26 10:02:19,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3337313.3333333335, ans=0.0 2023-11-26 10:02:21,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2023-11-26 10:02:32,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3337380.0, ans=0.125 2023-11-26 10:02:36,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3337380.0, ans=0.125 2023-11-26 10:02:49,121 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7650, loss[loss=0.05207, simple_loss=0.0734, pruned_loss=0.006157, audio_tagging_loss=0.009217, over 15590.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08964, pruned_loss=0.01215, audio_tagging_loss=0.008592, over 3042341.88 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:02:50,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3337513.3333333335, ans=0.0 2023-11-26 10:02:51,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3337513.3333333335, ans=0.125 2023-11-26 10:03:12,818 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500650 2023-11-26 10:03:15,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-11-26 10:03:27,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3337713.3333333335, ans=0.2 2023-11-26 10:03:41,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.661e+01 8.670e+01 9.156e+01 1.010e+02 1.245e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-26 10:03:45,678 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7700, loss[loss=0.05845, simple_loss=0.07975, pruned_loss=0.009624, audio_tagging_loss=0.008955, over 16361.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08927, pruned_loss=0.01212, audio_tagging_loss=0.008645, over 3044002.84 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:03:52,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3337846.6666666665, ans=0.125 2023-11-26 10:04:00,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.74 vs. limit=15.0 2023-11-26 10:04:08,459 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500700 2023-11-26 10:04:19,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3338046.6666666665, ans=0.125 2023-11-26 10:04:37,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3338113.3333333335, ans=0.0 2023-11-26 10:04:38,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3338113.3333333335, ans=0.125 2023-11-26 10:04:40,557 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7750, loss[loss=0.07876, simple_loss=0.1116, pruned_loss=0.01613, audio_tagging_loss=0.006814, over 15055.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09057, pruned_loss=0.01245, audio_tagging_loss=0.008725, over 3045793.80 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:04:42,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3338180.0, ans=0.0 2023-11-26 10:04:45,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3338180.0, ans=0.0 2023-11-26 10:05:04,473 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500750 2023-11-26 10:05:14,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3338380.0, ans=0.1 2023-11-26 10:05:32,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 9.045e+01 9.530e+01 1.049e+02 1.522e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 10:05:36,764 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7800, loss[loss=0.04852, simple_loss=0.06089, pruned_loss=0.007248, audio_tagging_loss=0.01083, over 14527.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08948, pruned_loss=0.0125, audio_tagging_loss=0.008774, over 3031366.41 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:05:38,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3338513.3333333335, ans=0.125 2023-11-26 10:05:47,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3338580.0, ans=0.125 2023-11-26 10:05:51,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3338580.0, ans=0.05 2023-11-26 10:05:54,279 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-26 10:06:00,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500800 2023-11-26 10:06:03,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2023-11-26 10:06:26,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3338780.0, ans=0.1 2023-11-26 10:06:32,787 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7850, loss[loss=0.06375, simple_loss=0.08536, pruned_loss=0.01403, audio_tagging_loss=0.00704, over 14928.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08969, pruned_loss=0.01257, audio_tagging_loss=0.008841, over 3033662.03 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:06:34,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3338846.6666666665, ans=0.1 2023-11-26 10:06:49,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2023-11-26 10:06:55,632 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500850 2023-11-26 10:07:25,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.991e+01 9.385e+01 9.905e+01 1.223e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 10:07:28,424 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7900, loss[loss=0.04983, simple_loss=0.06208, pruned_loss=0.008548, audio_tagging_loss=0.01024, over 14729.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0895, pruned_loss=0.01244, audio_tagging_loss=0.008919, over 3035055.02 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:07:46,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3339246.6666666665, ans=0.125 2023-11-26 10:07:52,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500900 2023-11-26 10:07:52,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3339313.3333333335, ans=0.125 2023-11-26 10:07:55,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3339313.3333333335, ans=0.125 2023-11-26 10:08:23,020 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7950, loss[loss=0.08739, simple_loss=0.1164, pruned_loss=0.02, audio_tagging_loss=0.009199, over 15378.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08928, pruned_loss=0.0124, audio_tagging_loss=0.009012, over 3038999.85 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:08:38,289 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:08:47,342 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500950 2023-11-26 10:08:51,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3339646.6666666665, ans=0.125 2023-11-26 10:09:04,685 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=12.0 2023-11-26 10:09:12,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3339780.0, ans=15.0 2023-11-26 10:09:15,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.923e+01 9.464e+01 1.020e+02 1.321e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 10:09:19,573 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8000, loss[loss=0.07682, simple_loss=0.1036, pruned_loss=0.01759, audio_tagging_loss=0.007412, over 15098.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08957, pruned_loss=0.01239, audio_tagging_loss=0.009082, over 3042278.65 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:09:36,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3339913.3333333335, ans=0.0 2023-11-26 10:09:39,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3339913.3333333335, ans=0.2 2023-11-26 10:09:42,389 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501000 2023-11-26 10:09:46,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3339980.0, ans=0.125 2023-11-26 10:10:04,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3340113.3333333335, ans=0.04949747468305833 2023-11-26 10:10:06,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3340113.3333333335, ans=0.125 2023-11-26 10:10:12,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-11-26 10:10:14,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3340180.0, ans=0.95 2023-11-26 10:10:15,404 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8050, loss[loss=0.06325, simple_loss=0.08937, pruned_loss=0.01032, audio_tagging_loss=0.008243, over 15064.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.0886, pruned_loss=0.01231, audio_tagging_loss=0.009122, over 3042681.10 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:10:22,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3340180.0, ans=0.0 2023-11-26 10:10:34,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3340246.6666666665, ans=0.125 2023-11-26 10:10:38,625 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501050 2023-11-26 10:10:43,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3340313.3333333335, ans=0.2 2023-11-26 10:10:44,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3340313.3333333335, ans=0.2 2023-11-26 10:10:50,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2023-11-26 10:10:54,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3340380.0, ans=0.125 2023-11-26 10:10:58,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3340380.0, ans=0.0 2023-11-26 10:11:07,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.742e+01 9.437e+01 9.965e+01 1.262e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 10:11:10,525 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8100, loss[loss=0.06103, simple_loss=0.08922, pruned_loss=0.01036, audio_tagging_loss=0.006062, over 14781.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08934, pruned_loss=0.01241, audio_tagging_loss=0.008982, over 3043402.42 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:11:23,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3340580.0, ans=0.125 2023-11-26 10:11:35,119 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501100 2023-11-26 10:11:39,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3340646.6666666665, ans=0.125 2023-11-26 10:11:57,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2023-11-26 10:12:05,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3340846.6666666665, ans=0.1 2023-11-26 10:12:07,295 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8150, loss[loss=0.05466, simple_loss=0.07174, pruned_loss=0.009894, audio_tagging_loss=0.008897, over 14268.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09009, pruned_loss=0.01248, audio_tagging_loss=0.008811, over 3043579.93 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:12:15,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3340846.6666666665, ans=0.125 2023-11-26 10:12:25,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3340913.3333333335, ans=0.09899494936611666 2023-11-26 10:12:30,314 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501150 2023-11-26 10:12:31,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3340980.0, ans=0.2 2023-11-26 10:12:34,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3340980.0, ans=0.1 2023-11-26 10:12:39,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3341046.6666666665, ans=0.125 2023-11-26 10:12:42,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3341046.6666666665, ans=0.1 2023-11-26 10:12:45,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3341046.6666666665, ans=0.1 2023-11-26 10:13:00,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.798e+01 9.302e+01 1.005e+02 1.230e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 10:13:02,549 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8200, loss[loss=0.08658, simple_loss=0.1219, pruned_loss=0.01583, audio_tagging_loss=0.009803, over 15791.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08998, pruned_loss=0.01225, audio_tagging_loss=0.008852, over 3046539.55 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:13:03,691 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:13:05,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=3341180.0, ans=12.0 2023-11-26 10:13:15,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3341246.6666666665, ans=0.125 2023-11-26 10:13:20,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3341246.6666666665, ans=0.0 2023-11-26 10:13:25,290 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501200 2023-11-26 10:13:26,849 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-26 10:13:40,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3341380.0, ans=0.0 2023-11-26 10:13:46,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3341446.6666666665, ans=0.0 2023-11-26 10:13:48,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3341446.6666666665, ans=0.125 2023-11-26 10:13:56,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3341513.3333333335, ans=0.1 2023-11-26 10:13:57,318 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8250, loss[loss=0.071, simple_loss=0.1007, pruned_loss=0.01114, audio_tagging_loss=0.009528, over 14857.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08993, pruned_loss=0.01236, audio_tagging_loss=0.008804, over 3041349.11 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:13:57,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3341513.3333333335, ans=0.125 2023-11-26 10:14:01,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3341513.3333333335, ans=0.0 2023-11-26 10:14:11,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2023-11-26 10:14:21,698 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501250 2023-11-26 10:14:29,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3341646.6666666665, ans=0.0 2023-11-26 10:14:37,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3341713.3333333335, ans=0.2 2023-11-26 10:14:40,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3341780.0, ans=0.2 2023-11-26 10:14:46,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3341780.0, ans=0.95 2023-11-26 10:14:48,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3341780.0, ans=0.125 2023-11-26 10:14:50,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.840e+01 9.471e+01 1.008e+02 1.505e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 10:14:51,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3341846.6666666665, ans=0.0 2023-11-26 10:14:52,787 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8300, loss[loss=0.07157, simple_loss=0.09893, pruned_loss=0.01292, audio_tagging_loss=0.009182, over 15360.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08999, pruned_loss=0.01246, audio_tagging_loss=0.00881, over 3042528.19 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:15:12,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3341913.3333333335, ans=0.2 2023-11-26 10:15:16,619 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501300 2023-11-26 10:15:49,227 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8350, loss[loss=0.06469, simple_loss=0.08639, pruned_loss=0.0111, audio_tagging_loss=0.0104, over 15783.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08966, pruned_loss=0.01228, audio_tagging_loss=0.008795, over 3048917.40 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:16:11,936 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501350 2023-11-26 10:16:14,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3342313.3333333335, ans=0.2 2023-11-26 10:16:22,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3342380.0, ans=10.0 2023-11-26 10:16:22,525 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2023-11-26 10:16:28,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3342380.0, ans=0.125 2023-11-26 10:16:35,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3342446.6666666665, ans=0.125 2023-11-26 10:16:42,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.918e+01 9.503e+01 1.018e+02 1.589e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 10:16:44,283 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8400, loss[loss=0.03488, simple_loss=0.04302, pruned_loss=0.003172, audio_tagging_loss=0.0102, over 15159.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08976, pruned_loss=0.01241, audio_tagging_loss=0.008722, over 3048064.65 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:16:47,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.74 vs. limit=10.0 2023-11-26 10:17:08,418 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501400 2023-11-26 10:17:31,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-11-26 10:17:32,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3342780.0, ans=0.1 2023-11-26 10:17:36,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-26 10:17:39,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3342846.6666666665, ans=0.0 2023-11-26 10:17:40,347 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8450, loss[loss=0.06233, simple_loss=0.07901, pruned_loss=0.01382, audio_tagging_loss=0.009009, over 14890.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08952, pruned_loss=0.01235, audio_tagging_loss=0.008732, over 3047777.81 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:17:47,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3342846.6666666665, ans=0.125 2023-11-26 10:17:49,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3342846.6666666665, ans=0.2 2023-11-26 10:18:03,586 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501450 2023-11-26 10:18:23,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3343113.3333333335, ans=0.125 2023-11-26 10:18:23,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3343113.3333333335, ans=0.2 2023-11-26 10:18:29,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2023-11-26 10:18:33,660 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.791e+01 9.317e+01 9.949e+01 1.409e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 10:18:36,368 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8500, loss[loss=0.06271, simple_loss=0.08841, pruned_loss=0.00859, audio_tagging_loss=0.009912, over 15265.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08845, pruned_loss=0.01224, audio_tagging_loss=0.008787, over 3050960.22 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:18:59,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501500 2023-11-26 10:19:29,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3343446.6666666665, ans=0.0 2023-11-26 10:19:31,398 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8550, loss[loss=0.04658, simple_loss=0.05837, pruned_loss=0.007208, audio_tagging_loss=0.01019, over 16043.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08898, pruned_loss=0.01237, audio_tagging_loss=0.008766, over 3050933.82 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:19:44,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3343580.0, ans=0.125 2023-11-26 10:19:48,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3343580.0, ans=0.2 2023-11-26 10:19:54,769 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501550 2023-11-26 10:20:00,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3343646.6666666665, ans=0.125 2023-11-26 10:20:02,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3343646.6666666665, ans=0.1 2023-11-26 10:20:02,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3343646.6666666665, ans=0.0 2023-11-26 10:20:09,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-11-26 10:20:24,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.767e+01 9.434e+01 1.006e+02 1.411e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 10:20:26,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3343846.6666666665, ans=0.0 2023-11-26 10:20:26,931 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8600, loss[loss=0.07327, simple_loss=0.1093, pruned_loss=0.01368, audio_tagging_loss=0.004927, over 14882.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08799, pruned_loss=0.01213, audio_tagging_loss=0.008945, over 3042910.03 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:20:32,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.57 vs. limit=6.0 2023-11-26 10:20:35,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3343846.6666666665, ans=0.2 2023-11-26 10:20:45,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3343913.3333333335, ans=0.0 2023-11-26 10:20:50,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3343980.0, ans=0.025 2023-11-26 10:20:50,919 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501600 2023-11-26 10:20:54,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3343980.0, ans=0.0 2023-11-26 10:20:56,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-26 10:20:58,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3343980.0, ans=0.0 2023-11-26 10:21:00,219 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-26 10:21:08,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2023-11-26 10:21:11,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3344113.3333333335, ans=0.125 2023-11-26 10:21:20,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3344113.3333333335, ans=0.125 2023-11-26 10:21:23,387 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8650, loss[loss=0.05331, simple_loss=0.06634, pruned_loss=0.01071, audio_tagging_loss=0.009438, over 15094.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08887, pruned_loss=0.01231, audio_tagging_loss=0.008948, over 3040726.54 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:21:25,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2023-11-26 10:21:26,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3344180.0, ans=15.0 2023-11-26 10:21:35,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=12.0 2023-11-26 10:21:46,186 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501650 2023-11-26 10:21:50,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3344313.3333333335, ans=0.125 2023-11-26 10:21:53,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3344313.3333333335, ans=0.125 2023-11-26 10:22:11,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3344446.6666666665, ans=0.2 2023-11-26 10:22:18,148 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.843e+01 9.565e+01 1.046e+02 1.310e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 10:22:19,249 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8700, loss[loss=0.06467, simple_loss=0.08271, pruned_loss=0.01586, audio_tagging_loss=0.007455, over 14365.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08918, pruned_loss=0.01246, audio_tagging_loss=0.009038, over 3034875.34 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:22:19,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3344513.3333333335, ans=0.125 2023-11-26 10:22:34,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3344580.0, ans=0.125 2023-11-26 10:22:42,578 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501700 2023-11-26 10:23:06,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3344780.0, ans=0.125 2023-11-26 10:23:08,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3344780.0, ans=0.2 2023-11-26 10:23:14,990 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8750, loss[loss=0.07834, simple_loss=0.1118, pruned_loss=0.01439, audio_tagging_loss=0.008063, over 16482.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09082, pruned_loss=0.01269, audio_tagging_loss=0.00905, over 3043363.44 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:23:15,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2023-11-26 10:23:38,511 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501750 2023-11-26 10:24:09,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 8.969e+01 9.428e+01 9.946e+01 1.483e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 10:24:10,485 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8800, loss[loss=0.05135, simple_loss=0.06435, pruned_loss=0.007821, audio_tagging_loss=0.01135, over 14650.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09164, pruned_loss=0.01276, audio_tagging_loss=0.009043, over 3045838.96 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:24:10,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3345180.0, ans=0.125 2023-11-26 10:24:14,474 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:24:33,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3345313.3333333335, ans=0.1 2023-11-26 10:24:34,013 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501800 2023-11-26 10:24:41,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3345313.3333333335, ans=0.07 2023-11-26 10:25:06,488 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8850, loss[loss=0.06414, simple_loss=0.09453, pruned_loss=0.009154, audio_tagging_loss=0.007718, over 15880.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09134, pruned_loss=0.0127, audio_tagging_loss=0.009042, over 3047333.56 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:25:18,736 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:25:24,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-26 10:25:29,778 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501850 2023-11-26 10:25:32,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3345646.6666666665, ans=0.0 2023-11-26 10:25:40,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3345713.3333333335, ans=0.2 2023-11-26 10:25:53,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 10:25:57,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3345780.0, ans=0.125 2023-11-26 10:26:01,683 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8900, loss[loss=0.08097, simple_loss=0.1112, pruned_loss=0.01802, audio_tagging_loss=0.007329, over 13969.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09024, pruned_loss=0.01256, audio_tagging_loss=0.008999, over 3046748.89 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:26:02,718 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.662e+01 9.297e+01 1.004e+02 1.286e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 10:26:25,478 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501900 2023-11-26 10:26:31,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3345980.0, ans=0.1 2023-11-26 10:26:33,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3345980.0, ans=0.0 2023-11-26 10:26:36,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3346046.6666666665, ans=0.0 2023-11-26 10:26:48,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3346113.3333333335, ans=0.0 2023-11-26 10:26:57,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3346180.0, ans=0.125 2023-11-26 10:26:57,798 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8950, loss[loss=0.06984, simple_loss=0.09885, pruned_loss=0.01331, audio_tagging_loss=0.007107, over 15904.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0904, pruned_loss=0.01255, audio_tagging_loss=0.008889, over 3051806.04 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:26:59,304 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=22.5 2023-11-26 10:27:02,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3346180.0, ans=0.0 2023-11-26 10:27:05,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3346180.0, ans=0.1 2023-11-26 10:27:14,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3346246.6666666665, ans=0.2 2023-11-26 10:27:14,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3346246.6666666665, ans=22.5 2023-11-26 10:27:15,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3346246.6666666665, ans=0.0 2023-11-26 10:27:19,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3346313.3333333335, ans=0.125 2023-11-26 10:27:20,539 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501950 2023-11-26 10:27:24,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3346313.3333333335, ans=0.125 2023-11-26 10:27:27,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-26 10:27:46,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3346446.6666666665, ans=0.1 2023-11-26 10:27:48,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-26 10:27:53,376 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9000, loss[loss=0.06412, simple_loss=0.09162, pruned_loss=0.01226, audio_tagging_loss=0.006049, over 15954.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09057, pruned_loss=0.01255, audio_tagging_loss=0.008726, over 3055496.96 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:27:53,377 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 10:28:26,049 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05901, simple_loss=0.0506, pruned_loss=0.005264, audio_tagging_loss=0.02845, over 4681554.00 frames. 2023-11-26 10:28:26,050 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 10:28:27,065 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.869e+01 9.478e+01 9.908e+01 1.192e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 10:28:38,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3346580.0, ans=0.0 2023-11-26 10:28:43,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2023-11-26 10:28:49,210 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502000 2023-11-26 10:28:49,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3346646.6666666665, ans=0.0 2023-11-26 10:28:56,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3346646.6666666665, ans=0.09899494936611666 2023-11-26 10:29:10,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3346780.0, ans=0.125 2023-11-26 10:29:21,682 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9050, loss[loss=0.06021, simple_loss=0.08213, pruned_loss=0.009687, audio_tagging_loss=0.00946, over 14332.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09014, pruned_loss=0.01257, audio_tagging_loss=0.008619, over 3056922.90 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:29:27,913 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-26 10:29:34,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3346913.3333333335, ans=0.2 2023-11-26 10:29:36,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3346913.3333333335, ans=0.0 2023-11-26 10:29:44,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502050 2023-11-26 10:29:52,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3346980.0, ans=0.2 2023-11-26 10:30:07,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3347113.3333333335, ans=0.125 2023-11-26 10:30:11,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3347113.3333333335, ans=0.0 2023-11-26 10:30:17,416 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9100, loss[loss=0.06914, simple_loss=0.08578, pruned_loss=0.01684, audio_tagging_loss=0.009409, over 14871.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09119, pruned_loss=0.01274, audio_tagging_loss=0.008493, over 3055481.70 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:30:18,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.587e+01 9.359e+01 1.002e+02 1.217e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 10:30:23,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-26 10:30:29,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3347246.6666666665, ans=0.125 2023-11-26 10:30:29,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3347246.6666666665, ans=0.125 2023-11-26 10:30:33,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3347246.6666666665, ans=0.125 2023-11-26 10:30:41,773 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502100 2023-11-26 10:30:54,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-11-26 10:31:01,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3347446.6666666665, ans=0.1 2023-11-26 10:31:13,163 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9150, loss[loss=0.0722, simple_loss=0.093, pruned_loss=0.01869, audio_tagging_loss=0.007011, over 14373.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.0907, pruned_loss=0.01272, audio_tagging_loss=0.008475, over 3048189.19 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:31:17,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3347513.3333333335, ans=0.0 2023-11-26 10:31:28,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3347580.0, ans=0.0 2023-11-26 10:31:32,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3347580.0, ans=0.07 2023-11-26 10:31:37,543 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502150 2023-11-26 10:31:48,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3347713.3333333335, ans=0.125 2023-11-26 10:31:53,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-11-26 10:31:59,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3347780.0, ans=0.0 2023-11-26 10:32:03,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2023-11-26 10:32:10,069 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9200, loss[loss=0.05202, simple_loss=0.06992, pruned_loss=0.00855, audio_tagging_loss=0.008513, over 15483.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09043, pruned_loss=0.0127, audio_tagging_loss=0.008543, over 3049760.67 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:32:11,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.621e+01 9.324e+01 1.045e+02 1.374e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 10:32:32,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3347980.0, ans=0.1 2023-11-26 10:32:33,168 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502200 2023-11-26 10:32:38,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3347980.0, ans=0.125 2023-11-26 10:32:40,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3347980.0, ans=0.5 2023-11-26 10:32:42,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2023-11-26 10:32:47,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3348046.6666666665, ans=0.2 2023-11-26 10:33:06,602 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9250, loss[loss=0.06643, simple_loss=0.0869, pruned_loss=0.01316, audio_tagging_loss=0.009822, over 14098.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09017, pruned_loss=0.01261, audio_tagging_loss=0.008596, over 3050761.88 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:33:16,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3348246.6666666665, ans=0.1 2023-11-26 10:33:16,728 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2023-11-26 10:33:19,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3348246.6666666665, ans=0.0 2023-11-26 10:33:30,087 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502250 2023-11-26 10:33:35,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3348313.3333333335, ans=0.07 2023-11-26 10:33:37,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3348313.3333333335, ans=0.1 2023-11-26 10:34:00,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3348446.6666666665, ans=0.125 2023-11-26 10:34:02,022 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9300, loss[loss=0.06715, simple_loss=0.08421, pruned_loss=0.01228, audio_tagging_loss=0.01276, over 15075.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08906, pruned_loss=0.01236, audio_tagging_loss=0.00871, over 3044700.09 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:34:03,055 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.725e+01 9.420e+01 1.023e+02 1.550e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 10:34:05,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3348513.3333333335, ans=0.125 2023-11-26 10:34:07,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3348513.3333333335, ans=0.1 2023-11-26 10:34:13,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2023-11-26 10:34:17,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3348580.0, ans=0.125 2023-11-26 10:34:26,042 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502300 2023-11-26 10:34:33,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2023-11-26 10:34:57,935 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9350, loss[loss=0.07513, simple_loss=0.09144, pruned_loss=0.01831, audio_tagging_loss=0.0111, over 14790.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08834, pruned_loss=0.01234, audio_tagging_loss=0.008847, over 3038322.13 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:35:04,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3348846.6666666665, ans=0.125 2023-11-26 10:35:08,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3348913.3333333335, ans=0.0 2023-11-26 10:35:13,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3348913.3333333335, ans=0.125 2023-11-26 10:35:14,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3348913.3333333335, ans=0.1 2023-11-26 10:35:21,319 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502350 2023-11-26 10:35:23,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3348980.0, ans=0.125 2023-11-26 10:35:32,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3349046.6666666665, ans=0.125 2023-11-26 10:35:35,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3349046.6666666665, ans=0.125 2023-11-26 10:35:45,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349113.3333333335, ans=0.1 2023-11-26 10:35:50,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3349113.3333333335, ans=0.0 2023-11-26 10:35:54,261 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9400, loss[loss=0.07217, simple_loss=0.1002, pruned_loss=0.01486, audio_tagging_loss=0.007222, over 15775.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08915, pruned_loss=0.01241, audio_tagging_loss=0.00886, over 3046555.75 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:35:55,295 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.779e+01 9.527e+01 1.025e+02 1.453e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 10:36:11,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3349246.6666666665, ans=0.0 2023-11-26 10:36:12,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3349246.6666666665, ans=0.125 2023-11-26 10:36:17,346 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502400 2023-11-26 10:36:20,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3349313.3333333335, ans=0.125 2023-11-26 10:36:34,644 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-26 10:36:41,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3349446.6666666665, ans=0.125 2023-11-26 10:36:41,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3349446.6666666665, ans=0.07 2023-11-26 10:36:49,338 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:36:50,157 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9450, loss[loss=0.07995, simple_loss=0.1032, pruned_loss=0.01901, audio_tagging_loss=0.009334, over 14985.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08983, pruned_loss=0.01233, audio_tagging_loss=0.008902, over 3046139.47 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:36:50,192 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:36:51,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3349513.3333333335, ans=0.125 2023-11-26 10:36:53,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3349513.3333333335, ans=0.125 2023-11-26 10:36:56,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349513.3333333335, ans=0.1 2023-11-26 10:37:02,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349580.0, ans=0.1 2023-11-26 10:37:14,688 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502450 2023-11-26 10:37:19,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2023-11-26 10:37:24,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3349713.3333333335, ans=0.2 2023-11-26 10:37:24,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-11-26 10:37:27,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3349713.3333333335, ans=0.0 2023-11-26 10:37:33,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3349780.0, ans=0.0 2023-11-26 10:37:46,023 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9500, loss[loss=0.05815, simple_loss=0.08014, pruned_loss=0.007656, audio_tagging_loss=0.01042, over 15389.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09043, pruned_loss=0.01247, audio_tagging_loss=0.008939, over 3047373.29 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:37:46,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3349846.6666666665, ans=0.125 2023-11-26 10:37:47,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.787e+01 9.530e+01 1.013e+02 1.442e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 10:37:49,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3349846.6666666665, ans=0.125 2023-11-26 10:37:54,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3349846.6666666665, ans=10.0 2023-11-26 10:37:54,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3349846.6666666665, ans=0.0 2023-11-26 10:38:05,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-11-26 10:38:07,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3349980.0, ans=0.125 2023-11-26 10:38:09,482 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502500 2023-11-26 10:38:09,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3349980.0, ans=0.125 2023-11-26 10:38:13,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3349980.0, ans=0.1 2023-11-26 10:38:42,487 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9550, loss[loss=0.088, simple_loss=0.1111, pruned_loss=0.02325, audio_tagging_loss=0.009209, over 15311.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09097, pruned_loss=0.01249, audio_tagging_loss=0.008936, over 3054880.31 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:38:57,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3350246.6666666665, ans=0.04949747468305833 2023-11-26 10:38:59,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=12.0 2023-11-26 10:39:05,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502550 2023-11-26 10:39:07,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350313.3333333335, ans=0.1 2023-11-26 10:39:07,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3350313.3333333335, ans=0.0 2023-11-26 10:39:08,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3350313.3333333335, ans=0.2 2023-11-26 10:39:13,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.98 vs. limit=10.0 2023-11-26 10:39:31,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2023-11-26 10:39:35,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3350446.6666666665, ans=0.04949747468305833 2023-11-26 10:39:37,563 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9600, loss[loss=0.06824, simple_loss=0.09385, pruned_loss=0.01318, audio_tagging_loss=0.008136, over 15543.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09014, pruned_loss=0.01225, audio_tagging_loss=0.009001, over 3053754.91 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:39:38,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.684e+01 9.310e+01 1.004e+02 1.298e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 10:39:46,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3350513.3333333335, ans=0.125 2023-11-26 10:39:52,456 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-11-26 10:40:01,561 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502600 2023-11-26 10:40:07,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2023-11-26 10:40:11,469 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-26 10:40:33,894 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9650, loss[loss=0.07478, simple_loss=0.09746, pruned_loss=0.01674, audio_tagging_loss=0.009316, over 15237.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08932, pruned_loss=0.01227, audio_tagging_loss=0.009042, over 3048756.22 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:40:42,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3350846.6666666665, ans=0.0 2023-11-26 10:40:57,895 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502650 2023-11-26 10:41:15,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3351046.6666666665, ans=0.125 2023-11-26 10:41:30,427 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9700, loss[loss=0.05212, simple_loss=0.07231, pruned_loss=0.006787, audio_tagging_loss=0.009173, over 15928.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08896, pruned_loss=0.01211, audio_tagging_loss=0.008911, over 3045796.76 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:41:31,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.780e+01 9.294e+01 1.006e+02 1.332e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 10:41:54,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502700 2023-11-26 10:42:26,555 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9750, loss[loss=0.04646, simple_loss=0.05843, pruned_loss=0.005466, audio_tagging_loss=0.01177, over 14913.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08883, pruned_loss=0.01226, audio_tagging_loss=0.008849, over 3037539.23 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:42:26,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3351513.3333333335, ans=0.0 2023-11-26 10:42:28,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3351513.3333333335, ans=0.125 2023-11-26 10:42:49,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502750 2023-11-26 10:42:52,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3351646.6666666665, ans=0.0 2023-11-26 10:43:21,512 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:43:22,282 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9800, loss[loss=0.05338, simple_loss=0.06417, pruned_loss=0.0088, audio_tagging_loss=0.0125, over 16490.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08906, pruned_loss=0.01223, audio_tagging_loss=0.008823, over 3039308.87 frames. ], batch size: 65, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:43:23,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 9.016e+01 9.407e+01 1.014e+02 1.286e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 10:43:26,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2023-11-26 10:43:28,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2023-11-26 10:43:38,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3351913.3333333335, ans=0.125 2023-11-26 10:43:45,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502800 2023-11-26 10:43:47,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3351980.0, ans=0.0 2023-11-26 10:44:14,087 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:44:15,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352113.3333333335, ans=0.1 2023-11-26 10:44:18,861 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9850, loss[loss=0.04742, simple_loss=0.06744, pruned_loss=0.00531, audio_tagging_loss=0.00839, over 16254.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08825, pruned_loss=0.01206, audio_tagging_loss=0.008836, over 3045108.71 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:44:20,100 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:44:38,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3352246.6666666665, ans=0.125 2023-11-26 10:44:41,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502850 2023-11-26 10:44:44,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3352313.3333333335, ans=0.0 2023-11-26 10:44:45,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-26 10:44:46,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3352313.3333333335, ans=0.0 2023-11-26 10:45:00,536 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:45:10,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3352446.6666666665, ans=0.125 2023-11-26 10:45:11,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3352446.6666666665, ans=0.125 2023-11-26 10:45:14,243 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9900, loss[loss=0.06394, simple_loss=0.09192, pruned_loss=0.01078, audio_tagging_loss=0.007197, over 16063.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0889, pruned_loss=0.01209, audio_tagging_loss=0.008792, over 3047946.03 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:45:16,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.825e+01 9.361e+01 1.007e+02 1.352e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 10:45:24,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-11-26 10:45:31,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3352580.0, ans=0.125 2023-11-26 10:45:32,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3352580.0, ans=0.0 2023-11-26 10:45:32,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3352580.0, ans=0.05 2023-11-26 10:45:38,596 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502900 2023-11-26 10:45:59,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352780.0, ans=0.1 2023-11-26 10:46:11,022 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9950, loss[loss=0.05761, simple_loss=0.07498, pruned_loss=0.01009, audio_tagging_loss=0.01004, over 14587.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08806, pruned_loss=0.01199, audio_tagging_loss=0.00885, over 3042478.84 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:46:15,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3352846.6666666665, ans=0.125 2023-11-26 10:46:21,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-11-26 10:46:23,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3352913.3333333335, ans=0.125 2023-11-26 10:46:34,424 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502950 2023-11-26 10:46:40,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3352980.0, ans=0.0 2023-11-26 10:46:57,071 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:47:00,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3353113.3333333335, ans=0.125 2023-11-26 10:47:06,955 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10000, loss[loss=0.04301, simple_loss=0.06009, pruned_loss=0.003904, audio_tagging_loss=0.009059, over 16072.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08767, pruned_loss=0.01198, audio_tagging_loss=0.008776, over 3043730.66 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:47:09,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 8.822e+01 9.378e+01 1.009e+02 1.316e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 10:47:30,543 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503000 2023-11-26 10:47:31,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2023-11-26 10:47:41,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3353380.0, ans=0.125 2023-11-26 10:48:03,174 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10050, loss[loss=0.06465, simple_loss=0.08597, pruned_loss=0.01253, audio_tagging_loss=0.009124, over 15461.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.0879, pruned_loss=0.01202, audio_tagging_loss=0.008787, over 3044522.65 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:48:06,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3353513.3333333335, ans=0.125 2023-11-26 10:48:14,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3353580.0, ans=0.0 2023-11-26 10:48:27,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503050 2023-11-26 10:48:40,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3353713.3333333335, ans=0.2 2023-11-26 10:48:59,504 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10100, loss[loss=0.07373, simple_loss=0.09407, pruned_loss=0.01584, audio_tagging_loss=0.01085, over 15486.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08859, pruned_loss=0.01218, audio_tagging_loss=0.008814, over 3044776.01 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:49:02,644 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.507e+01 9.128e+01 1.020e+02 1.362e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-26 10:49:18,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3353913.3333333335, ans=0.125 2023-11-26 10:49:18,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3353913.3333333335, ans=0.125 2023-11-26 10:49:22,962 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503100 2023-11-26 10:49:23,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-26 10:49:37,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3354046.6666666665, ans=0.05 2023-11-26 10:49:45,224 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:49:54,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3354180.0, ans=0.125 2023-11-26 10:49:54,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3354180.0, ans=0.125 2023-11-26 10:49:55,355 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10150, loss[loss=0.07895, simple_loss=0.1134, pruned_loss=0.01139, audio_tagging_loss=0.01084, over 16887.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08898, pruned_loss=0.0122, audio_tagging_loss=0.008895, over 3049736.45 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:50:03,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3354180.0, ans=0.0 2023-11-26 10:50:07,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3354246.6666666665, ans=0.125 2023-11-26 10:50:14,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3354246.6666666665, ans=0.125 2023-11-26 10:50:16,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3354313.3333333335, ans=0.125 2023-11-26 10:50:18,269 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503150 2023-11-26 10:50:22,567 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:50:36,706 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:50:50,980 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10200, loss[loss=0.08155, simple_loss=0.112, pruned_loss=0.01625, audio_tagging_loss=0.009291, over 14543.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08905, pruned_loss=0.01212, audio_tagging_loss=0.008921, over 3053928.47 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:50:54,052 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.608e+01 9.223e+01 1.008e+02 1.287e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 10:50:56,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3354513.3333333335, ans=0.125 2023-11-26 10:51:00,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-26 10:51:01,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3354580.0, ans=0.1 2023-11-26 10:51:13,274 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:51:14,391 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503200 2023-11-26 10:51:20,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3354646.6666666665, ans=0.125 2023-11-26 10:51:37,381 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-26 10:51:46,351 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10250, loss[loss=0.07333, simple_loss=0.1043, pruned_loss=0.01355, audio_tagging_loss=0.007618, over 14406.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08988, pruned_loss=0.01218, audio_tagging_loss=0.008909, over 3049265.52 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:51:47,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3354846.6666666665, ans=0.2 2023-11-26 10:52:03,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3354913.3333333335, ans=0.1 2023-11-26 10:52:10,948 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503250 2023-11-26 10:52:19,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3355046.6666666665, ans=0.025 2023-11-26 10:52:19,748 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2023-11-26 10:52:21,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3355046.6666666665, ans=0.0 2023-11-26 10:52:43,323 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10300, loss[loss=0.0818, simple_loss=0.1057, pruned_loss=0.02079, audio_tagging_loss=0.008164, over 15245.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08967, pruned_loss=0.01231, audio_tagging_loss=0.008969, over 3045739.01 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:52:46,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.732e+01 9.378e+01 1.015e+02 1.295e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 10:52:58,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3355246.6666666665, ans=10.0 2023-11-26 10:53:02,522 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2023-11-26 10:53:06,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503300 2023-11-26 10:53:22,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3355380.0, ans=0.125 2023-11-26 10:53:26,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3355380.0, ans=0.1 2023-11-26 10:53:27,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3355446.6666666665, ans=0.0 2023-11-26 10:53:39,419 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10350, loss[loss=0.06559, simple_loss=0.0904, pruned_loss=0.01016, audio_tagging_loss=0.01022, over 14884.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08988, pruned_loss=0.01238, audio_tagging_loss=0.008995, over 3038732.33 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:53:48,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3355513.3333333335, ans=0.125 2023-11-26 10:53:57,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3355580.0, ans=0.2 2023-11-26 10:53:57,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2023-11-26 10:54:02,151 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503350 2023-11-26 10:54:34,668 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10400, loss[loss=0.04914, simple_loss=0.06716, pruned_loss=0.00471, audio_tagging_loss=0.01085, over 15338.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08922, pruned_loss=0.01228, audio_tagging_loss=0.009148, over 3040526.84 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:54:37,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.834e+01 9.411e+01 9.985e+01 1.468e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 10:54:39,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3355846.6666666665, ans=0.0 2023-11-26 10:54:55,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3355913.3333333335, ans=0.1 2023-11-26 10:54:58,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503400 2023-11-26 10:55:01,379 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:55:30,878 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10450, loss[loss=0.0819, simple_loss=0.1113, pruned_loss=0.0178, audio_tagging_loss=0.008451, over 15359.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.0901, pruned_loss=0.01228, audio_tagging_loss=0.009029, over 3038217.41 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:55:35,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3356180.0, ans=0.0 2023-11-26 10:55:54,506 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503450 2023-11-26 10:56:22,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3356446.6666666665, ans=0.0 2023-11-26 10:56:24,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3356446.6666666665, ans=0.0 2023-11-26 10:56:27,221 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10500, loss[loss=0.05467, simple_loss=0.07374, pruned_loss=0.01125, audio_tagging_loss=0.006548, over 14687.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.0899, pruned_loss=0.01237, audio_tagging_loss=0.008966, over 3044435.11 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:56:30,345 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.592e+01 9.300e+01 9.951e+01 1.449e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 10:56:30,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2023-11-26 10:56:36,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=12.0 2023-11-26 10:56:38,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3356580.0, ans=0.125 2023-11-26 10:56:50,270 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503500 2023-11-26 10:56:50,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3356646.6666666665, ans=0.05 2023-11-26 10:56:54,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3356646.6666666665, ans=0.1 2023-11-26 10:57:05,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3356713.3333333335, ans=0.125 2023-11-26 10:57:22,550 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10550, loss[loss=0.0692, simple_loss=0.0935, pruned_loss=0.01205, audio_tagging_loss=0.01039, over 14470.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09013, pruned_loss=0.0125, audio_tagging_loss=0.008772, over 3051504.16 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:57:28,112 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:57:34,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3356913.3333333335, ans=0.125 2023-11-26 10:57:40,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3356913.3333333335, ans=0.0 2023-11-26 10:57:47,199 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503550 2023-11-26 10:57:51,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3356980.0, ans=0.2 2023-11-26 10:58:01,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3357046.6666666665, ans=0.125 2023-11-26 10:58:02,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3357046.6666666665, ans=0.0 2023-11-26 10:58:18,534 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10600, loss[loss=0.05693, simple_loss=0.0722, pruned_loss=0.01204, audio_tagging_loss=0.008788, over 15634.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08955, pruned_loss=0.01227, audio_tagging_loss=0.008768, over 3044021.82 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:58:18,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3357180.0, ans=0.0 2023-11-26 10:58:19,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357180.0, ans=0.1 2023-11-26 10:58:23,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.911e+01 9.725e+01 1.038e+02 1.409e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-26 10:58:25,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3357180.0, ans=0.0 2023-11-26 10:58:31,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3357246.6666666665, ans=0.125 2023-11-26 10:58:39,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3357246.6666666665, ans=0.125 2023-11-26 10:58:42,556 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503600 2023-11-26 10:58:49,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3357313.3333333335, ans=0.015 2023-11-26 10:58:52,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3357380.0, ans=0.125 2023-11-26 10:58:57,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3357380.0, ans=0.1 2023-11-26 10:58:58,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3357380.0, ans=0.95 2023-11-26 10:58:58,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357380.0, ans=0.1 2023-11-26 10:59:07,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3357446.6666666665, ans=0.2 2023-11-26 10:59:13,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2023-11-26 10:59:15,782 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10650, loss[loss=0.06729, simple_loss=0.09259, pruned_loss=0.01296, audio_tagging_loss=0.008031, over 15104.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08963, pruned_loss=0.01234, audio_tagging_loss=0.008734, over 3040144.21 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:59:23,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3357513.3333333335, ans=0.0 2023-11-26 10:59:25,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357580.0, ans=0.1 2023-11-26 10:59:38,845 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503650 2023-11-26 10:59:47,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357713.3333333335, ans=0.1 2023-11-26 10:59:50,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3357713.3333333335, ans=0.2 2023-11-26 10:59:58,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-11-26 11:00:10,589 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10700, loss[loss=0.06213, simple_loss=0.07204, pruned_loss=0.0144, audio_tagging_loss=0.01171, over 15159.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08906, pruned_loss=0.01219, audio_tagging_loss=0.008698, over 3037770.29 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:00:14,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.978e+01 9.499e+01 1.034e+02 2.026e+02, threshold=1.900e+02, percent-clipped=1.0 2023-11-26 11:00:18,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3357846.6666666665, ans=0.0 2023-11-26 11:00:27,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3357913.3333333335, ans=0.125 2023-11-26 11:00:32,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3357980.0, ans=0.125 2023-11-26 11:00:32,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2023-11-26 11:00:34,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503700 2023-11-26 11:00:35,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2023-11-26 11:00:46,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3358046.6666666665, ans=0.1 2023-11-26 11:00:50,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2023-11-26 11:01:06,432 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10750, loss[loss=0.05017, simple_loss=0.06038, pruned_loss=0.009035, audio_tagging_loss=0.01095, over 16299.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08877, pruned_loss=0.01215, audio_tagging_loss=0.00875, over 3038274.30 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:01:12,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3358180.0, ans=0.0 2023-11-26 11:01:12,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3358180.0, ans=0.125 2023-11-26 11:01:18,022 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2023-11-26 11:01:30,442 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503750 2023-11-26 11:01:49,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3358380.0, ans=0.125 2023-11-26 11:02:02,704 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10800, loss[loss=0.05956, simple_loss=0.07995, pruned_loss=0.01145, audio_tagging_loss=0.00813, over 15458.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08848, pruned_loss=0.01208, audio_tagging_loss=0.008726, over 3045925.93 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:02:07,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.688e+01 9.211e+01 9.904e+01 2.001e+02, threshold=1.842e+02, percent-clipped=1.0 2023-11-26 11:02:17,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3358580.0, ans=0.0 2023-11-26 11:02:22,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3358580.0, ans=0.09899494936611666 2023-11-26 11:02:25,505 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503800 2023-11-26 11:02:56,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3358780.0, ans=0.0 2023-11-26 11:02:58,749 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10850, loss[loss=0.07299, simple_loss=0.08874, pruned_loss=0.01761, audio_tagging_loss=0.01101, over 14978.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08996, pruned_loss=0.01257, audio_tagging_loss=0.008628, over 3047568.20 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:03:05,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3358846.6666666665, ans=0.1 2023-11-26 11:03:22,354 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503850 2023-11-26 11:03:30,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=15.0 2023-11-26 11:03:40,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3359046.6666666665, ans=0.125 2023-11-26 11:03:51,970 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:03:54,709 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10900, loss[loss=0.0667, simple_loss=0.09284, pruned_loss=0.01248, audio_tagging_loss=0.007793, over 14623.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08928, pruned_loss=0.01239, audio_tagging_loss=0.008649, over 3045666.90 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:03:58,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.941e+01 9.584e+01 1.034e+02 1.250e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 11:04:18,875 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503900 2023-11-26 11:04:50,465 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10950, loss[loss=0.06631, simple_loss=0.08845, pruned_loss=0.01329, audio_tagging_loss=0.00879, over 15869.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08907, pruned_loss=0.01232, audio_tagging_loss=0.008674, over 3048409.24 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:04:50,896 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=22.5 2023-11-26 11:04:56,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-26 11:05:11,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-26 11:05:13,774 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503950 2023-11-26 11:05:14,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2023-11-26 11:05:31,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3359713.3333333335, ans=0.025 2023-11-26 11:05:31,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3359713.3333333335, ans=0.0 2023-11-26 11:05:35,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2023-11-26 11:05:36,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3359780.0, ans=0.1 2023-11-26 11:05:43,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3359780.0, ans=0.125 2023-11-26 11:05:46,838 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11000, loss[loss=0.06703, simple_loss=0.09117, pruned_loss=0.01276, audio_tagging_loss=0.008687, over 13717.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08798, pruned_loss=0.01212, audio_tagging_loss=0.00874, over 3049198.76 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:05:46,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3359846.6666666665, ans=0.2 2023-11-26 11:05:52,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.584e+01 9.485e+01 1.002e+02 1.136e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 11:05:56,471 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:06:01,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3359913.3333333335, ans=0.125 2023-11-26 11:06:06,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3359913.3333333335, ans=0.0 2023-11-26 11:06:06,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3359913.3333333335, ans=0.2 2023-11-26 11:06:10,135 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504000 2023-11-26 11:06:15,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3359980.0, ans=0.125 2023-11-26 11:06:26,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3360046.6666666665, ans=0.0 2023-11-26 11:06:44,384 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11050, loss[loss=0.06981, simple_loss=0.1035, pruned_loss=0.01263, audio_tagging_loss=0.005438, over 14741.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08893, pruned_loss=0.01228, audio_tagging_loss=0.008803, over 3056983.49 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:07:08,387 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504050 2023-11-26 11:07:08,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3360313.3333333335, ans=0.125 2023-11-26 11:07:14,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3360313.3333333335, ans=0.0 2023-11-26 11:07:19,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3360380.0, ans=0.0 2023-11-26 11:07:25,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-26 11:07:25,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3360380.0, ans=0.07 2023-11-26 11:07:35,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3360446.6666666665, ans=0.125 2023-11-26 11:07:38,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2023-11-26 11:07:40,713 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11100, loss[loss=0.05469, simple_loss=0.0712, pruned_loss=0.009286, audio_tagging_loss=0.009803, over 13917.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08914, pruned_loss=0.01216, audio_tagging_loss=0.008795, over 3065114.61 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:07:46,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.699e+01 9.322e+01 9.971e+01 1.375e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-26 11:07:52,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3360580.0, ans=0.1 2023-11-26 11:08:04,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504100 2023-11-26 11:08:11,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2023-11-26 11:08:36,570 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11150, loss[loss=0.05903, simple_loss=0.07885, pruned_loss=0.01165, audio_tagging_loss=0.007958, over 15161.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08934, pruned_loss=0.01222, audio_tagging_loss=0.008816, over 3066870.10 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:08:41,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3360846.6666666665, ans=0.0 2023-11-26 11:08:48,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3360913.3333333335, ans=0.125 2023-11-26 11:09:00,113 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504150 2023-11-26 11:09:12,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3361046.6666666665, ans=0.125 2023-11-26 11:09:26,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3361113.3333333335, ans=0.1 2023-11-26 11:09:32,594 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11200, loss[loss=0.06213, simple_loss=0.07949, pruned_loss=0.01408, audio_tagging_loss=0.008308, over 15198.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08955, pruned_loss=0.01219, audio_tagging_loss=0.008903, over 3057135.98 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:09:39,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.765e+01 9.322e+01 9.953e+01 1.270e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-26 11:09:54,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3361313.3333333335, ans=0.2 2023-11-26 11:09:56,416 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504200 2023-11-26 11:10:13,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3361380.0, ans=0.2 2023-11-26 11:10:23,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3361446.6666666665, ans=0.1 2023-11-26 11:10:25,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3361446.6666666665, ans=0.0 2023-11-26 11:10:28,729 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11250, loss[loss=0.0809, simple_loss=0.1147, pruned_loss=0.01602, audio_tagging_loss=0.007527, over 14790.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08906, pruned_loss=0.01228, audio_tagging_loss=0.008866, over 3055504.52 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:10:44,345 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:10:51,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504250 2023-11-26 11:11:03,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3361713.3333333335, ans=0.1 2023-11-26 11:11:10,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3361713.3333333335, ans=0.125 2023-11-26 11:11:22,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3361780.0, ans=0.0 2023-11-26 11:11:24,495 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11300, loss[loss=0.05369, simple_loss=0.07061, pruned_loss=0.01019, audio_tagging_loss=0.008186, over 13867.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08908, pruned_loss=0.01211, audio_tagging_loss=0.008759, over 3048228.60 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:11:26,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3361846.6666666665, ans=0.0 2023-11-26 11:11:30,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.741e+01 9.355e+01 1.007e+02 1.157e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 11:11:35,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3361913.3333333335, ans=0.2 2023-11-26 11:11:36,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=12.0 2023-11-26 11:11:48,056 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504300 2023-11-26 11:11:48,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3361980.0, ans=0.125 2023-11-26 11:12:11,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3362113.3333333335, ans=0.0 2023-11-26 11:12:19,981 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11350, loss[loss=0.06442, simple_loss=0.0924, pruned_loss=0.01207, audio_tagging_loss=0.006153, over 15073.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0896, pruned_loss=0.01221, audio_tagging_loss=0.00863, over 3042129.78 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:12:29,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3362180.0, ans=0.0 2023-11-26 11:12:43,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3362313.3333333335, ans=0.0 2023-11-26 11:12:44,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504350 2023-11-26 11:13:10,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3362446.6666666665, ans=0.125 2023-11-26 11:13:15,813 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11400, loss[loss=0.05173, simple_loss=0.06671, pruned_loss=0.008381, audio_tagging_loss=0.009996, over 13812.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08936, pruned_loss=0.01215, audio_tagging_loss=0.008642, over 3043313.00 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:13:17,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3362513.3333333335, ans=0.125 2023-11-26 11:13:17,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3362513.3333333335, ans=0.125 2023-11-26 11:13:20,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3362513.3333333335, ans=0.0 2023-11-26 11:13:23,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.698e+01 9.567e+01 1.043e+02 1.467e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 11:13:39,030 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504400 2023-11-26 11:13:48,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3362713.3333333335, ans=0.125 2023-11-26 11:14:12,480 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11450, loss[loss=0.05699, simple_loss=0.07922, pruned_loss=0.01019, audio_tagging_loss=0.007185, over 14519.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09009, pruned_loss=0.01235, audio_tagging_loss=0.008621, over 3038293.21 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:14:17,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3362846.6666666665, ans=0.125 2023-11-26 11:14:31,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3362913.3333333335, ans=0.125 2023-11-26 11:14:35,879 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504450 2023-11-26 11:14:59,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3363113.3333333335, ans=0.125 2023-11-26 11:15:07,647 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11500, loss[loss=0.07219, simple_loss=0.1024, pruned_loss=0.01427, audio_tagging_loss=0.006726, over 17166.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08993, pruned_loss=0.01234, audio_tagging_loss=0.008581, over 3044059.39 frames. ], batch size: 65, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:15:15,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.382e+01 8.706e+01 9.345e+01 1.007e+02 1.360e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 11:15:15,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3363180.0, ans=0.0 2023-11-26 11:15:32,354 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504500 2023-11-26 11:15:37,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2023-11-26 11:16:03,940 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11550, loss[loss=0.06323, simple_loss=0.0837, pruned_loss=0.01208, audio_tagging_loss=0.009296, over 15652.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08957, pruned_loss=0.01235, audio_tagging_loss=0.00858, over 3039242.89 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:16:27,544 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504550 2023-11-26 11:16:38,078 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:16:39,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3363713.3333333335, ans=0.125 2023-11-26 11:16:44,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3363713.3333333335, ans=0.125 2023-11-26 11:17:00,349 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11600, loss[loss=0.06503, simple_loss=0.09279, pruned_loss=0.00935, audio_tagging_loss=0.009285, over 14640.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08949, pruned_loss=0.01232, audio_tagging_loss=0.008585, over 3033003.15 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:17:03,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3363846.6666666665, ans=0.2 2023-11-26 11:17:08,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.748e+01 9.551e+01 1.048e+02 1.358e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 11:17:14,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3363913.3333333335, ans=0.2 2023-11-26 11:17:16,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3363913.3333333335, ans=0.1 2023-11-26 11:17:22,560 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504600 2023-11-26 11:17:52,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3364113.3333333335, ans=0.125 2023-11-26 11:17:55,414 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11650, loss[loss=0.0649, simple_loss=0.08524, pruned_loss=0.01082, audio_tagging_loss=0.01147, over 15515.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08921, pruned_loss=0.01217, audio_tagging_loss=0.008727, over 3033908.82 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:18:02,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3364180.0, ans=0.1 2023-11-26 11:18:05,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3364246.6666666665, ans=0.125 2023-11-26 11:18:09,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3364246.6666666665, ans=0.125 2023-11-26 11:18:18,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3364313.3333333335, ans=0.0 2023-11-26 11:18:19,662 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504650 2023-11-26 11:18:39,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3364446.6666666665, ans=0.2 2023-11-26 11:18:42,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3364446.6666666665, ans=0.2 2023-11-26 11:18:45,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3364446.6666666665, ans=0.07 2023-11-26 11:18:51,438 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11700, loss[loss=0.03509, simple_loss=0.03809, pruned_loss=0.006008, audio_tagging_loss=0.01004, over 14682.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08866, pruned_loss=0.01222, audio_tagging_loss=0.008856, over 3042089.69 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:18:51,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3364513.3333333335, ans=0.0 2023-11-26 11:19:00,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.724e+01 9.353e+01 9.996e+01 1.834e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 11:19:01,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3364513.3333333335, ans=0.125 2023-11-26 11:19:01,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3364513.3333333335, ans=15.0 2023-11-26 11:19:15,254 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504700 2023-11-26 11:19:18,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3364646.6666666665, ans=0.035 2023-11-26 11:19:27,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3364713.3333333335, ans=0.125 2023-11-26 11:19:36,130 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:19:36,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3364780.0, ans=0.1 2023-11-26 11:19:47,715 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11750, loss[loss=0.06805, simple_loss=0.08827, pruned_loss=0.01792, audio_tagging_loss=0.005992, over 14529.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08758, pruned_loss=0.01204, audio_tagging_loss=0.008936, over 3041117.73 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:19:54,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3364846.6666666665, ans=0.2 2023-11-26 11:19:58,058 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:20:10,579 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504750 2023-11-26 11:20:40,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3365113.3333333335, ans=0.125 2023-11-26 11:20:43,314 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11800, loss[loss=0.07316, simple_loss=0.1005, pruned_loss=0.01541, audio_tagging_loss=0.007511, over 14917.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08831, pruned_loss=0.01218, audio_tagging_loss=0.009039, over 3037786.88 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:20:46,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3365180.0, ans=0.125 2023-11-26 11:20:51,620 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.955e+01 9.712e+01 1.042e+02 1.352e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-26 11:20:51,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365180.0, ans=0.1 2023-11-26 11:21:06,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504800 2023-11-26 11:21:17,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3365380.0, ans=0.125 2023-11-26 11:21:26,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3365380.0, ans=0.125 2023-11-26 11:21:29,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3365446.6666666665, ans=0.025 2023-11-26 11:21:34,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=12.0 2023-11-26 11:21:39,227 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11850, loss[loss=0.04939, simple_loss=0.06192, pruned_loss=0.007361, audio_tagging_loss=0.01107, over 15442.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08844, pruned_loss=0.01224, audio_tagging_loss=0.009076, over 3038002.87 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:21:40,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3365513.3333333335, ans=0.125 2023-11-26 11:22:01,230 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:22:03,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504850 2023-11-26 11:22:10,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3365646.6666666665, ans=0.2 2023-11-26 11:22:18,399 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2023-11-26 11:22:29,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3365780.0, ans=0.125 2023-11-26 11:22:34,570 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11900, loss[loss=0.07325, simple_loss=0.0976, pruned_loss=0.01272, audio_tagging_loss=0.01173, over 14411.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08968, pruned_loss=0.01227, audio_tagging_loss=0.009038, over 3044287.13 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:22:34,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3365846.6666666665, ans=0.0 2023-11-26 11:22:37,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=12.0 2023-11-26 11:22:44,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.882e+01 9.383e+01 1.003e+02 1.296e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 11:22:58,070 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504900 2023-11-26 11:23:15,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2023-11-26 11:23:19,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3366113.3333333335, ans=0.0 2023-11-26 11:23:23,092 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:23:30,283 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11950, loss[loss=0.08652, simple_loss=0.1226, pruned_loss=0.01651, audio_tagging_loss=0.008729, over 15597.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08957, pruned_loss=0.0124, audio_tagging_loss=0.009171, over 3042206.90 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:23:38,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3366180.0, ans=0.1 2023-11-26 11:23:53,606 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504950 2023-11-26 11:23:54,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3366313.3333333335, ans=0.0 2023-11-26 11:24:12,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3366380.0, ans=0.125 2023-11-26 11:24:12,431 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:24:12,601 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=22.5 2023-11-26 11:24:24,557 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 12000, loss[loss=0.06252, simple_loss=0.08227, pruned_loss=0.01078, audio_tagging_loss=0.0106, over 14095.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08926, pruned_loss=0.01238, audio_tagging_loss=0.009242, over 3039231.78 frames. ], batch size: 52, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:24:24,558 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 11:24:57,252 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05796, simple_loss=0.05063, pruned_loss=0.005274, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-26 11:24:57,253 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 11:25:00,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3366513.3333333335, ans=0.0 2023-11-26 11:25:01,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3366513.3333333335, ans=0.1 2023-11-26 11:25:04,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.83 vs. limit=10.0 2023-11-26 11:25:05,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.933e+01 9.493e+01 1.025e+02 1.345e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 11:25:19,326 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505000 2023-11-26 11:25:19,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3366646.6666666665, ans=0.5 2023-11-26 11:25:50,627 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 0, loss[loss=0.06791, simple_loss=0.07503, pruned_loss=0.007398, audio_tagging_loss=0.02299, over 14746.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.07503, pruned_loss=0.007398, audio_tagging_loss=0.02299, over 14746.00 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:25:50,627 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 11:26:02,806 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.1126, 5.7825, 5.5141, 5.5600], device='cuda:2') 2023-11-26 11:26:21,928 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.05779, simple_loss=0.05063, pruned_loss=0.005275, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-26 11:26:21,928 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 11:26:35,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3366740.0, ans=0.125 2023-11-26 11:26:38,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3366740.0, ans=0.0 2023-11-26 11:26:51,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3366806.6666666665, ans=0.125 2023-11-26 11:27:01,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3366873.3333333335, ans=0.125 2023-11-26 11:27:14,027 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505050 2023-11-26 11:27:17,142 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 50, loss[loss=0.07436, simple_loss=0.09169, pruned_loss=0.009738, audio_tagging_loss=0.01878, over 15113.00 frames. ], tot_loss[loss=0.07488, simple_loss=0.08975, pruned_loss=0.01271, audio_tagging_loss=0.01729, over 692857.18 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:27:30,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3367073.3333333335, ans=0.2 2023-11-26 11:27:33,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3367073.3333333335, ans=0.1 2023-11-26 11:27:33,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3367073.3333333335, ans=0.0 2023-11-26 11:27:37,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3367073.3333333335, ans=0.025 2023-11-26 11:27:45,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-26 11:27:56,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.204e+01 9.484e+01 1.020e+02 1.096e+02 2.411e+02, threshold=2.041e+02, percent-clipped=1.0 2023-11-26 11:28:09,872 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505100 2023-11-26 11:28:13,115 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 100, loss[loss=0.04729, simple_loss=0.05581, pruned_loss=0.005437, audio_tagging_loss=0.01395, over 16583.00 frames. ], tot_loss[loss=0.07171, simple_loss=0.08633, pruned_loss=0.01204, audio_tagging_loss=0.01651, over 1216557.85 frames. ], batch size: 66, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:28:32,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3367406.6666666665, ans=0.0 2023-11-26 11:28:47,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3367540.0, ans=0.025 2023-11-26 11:29:02,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3367606.6666666665, ans=0.125 2023-11-26 11:29:06,285 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505150 2023-11-26 11:29:09,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-11-26 11:29:09,529 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 150, loss[loss=0.06665, simple_loss=0.09402, pruned_loss=0.011, audio_tagging_loss=0.008647, over 16076.00 frames. ], tot_loss[loss=0.07095, simple_loss=0.08801, pruned_loss=0.01217, audio_tagging_loss=0.01477, over 1631283.01 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:29:26,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3367740.0, ans=0.125 2023-11-26 11:29:46,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3367873.3333333335, ans=0.0 2023-11-26 11:29:49,993 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 9.187e+01 9.762e+01 1.032e+02 1.254e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-26 11:29:54,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3367940.0, ans=0.125 2023-11-26 11:29:55,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3367940.0, ans=0.95 2023-11-26 11:29:57,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3367940.0, ans=0.125 2023-11-26 11:29:59,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3367940.0, ans=0.0 2023-11-26 11:30:00,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3367940.0, ans=0.1 2023-11-26 11:30:02,307 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505200 2023-11-26 11:30:05,722 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 200, loss[loss=0.08113, simple_loss=0.111, pruned_loss=0.01745, audio_tagging_loss=0.008203, over 15796.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.08775, pruned_loss=0.01202, audio_tagging_loss=0.01312, over 1943977.12 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:30:21,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3368073.3333333335, ans=0.0 2023-11-26 11:30:23,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=12.0 2023-11-26 11:30:25,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3368073.3333333335, ans=0.0 2023-11-26 11:30:34,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3368140.0, ans=0.05 2023-11-26 11:30:35,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.03 vs. limit=22.5 2023-11-26 11:30:44,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3368206.6666666665, ans=0.1 2023-11-26 11:30:47,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3368206.6666666665, ans=0.0 2023-11-26 11:30:54,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-26 11:30:56,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3368273.3333333335, ans=0.0 2023-11-26 11:30:58,361 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505250 2023-11-26 11:31:02,025 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 250, loss[loss=0.07521, simple_loss=0.1006, pruned_loss=0.01613, audio_tagging_loss=0.008783, over 15446.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.08924, pruned_loss=0.0124, audio_tagging_loss=0.01182, over 2194256.76 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:31:26,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-26 11:31:28,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-26 11:31:38,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3368540.0, ans=0.125 2023-11-26 11:31:43,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.006e+01 9.717e+01 1.058e+02 1.490e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 11:31:54,689 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505300 2023-11-26 11:31:58,283 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 300, loss[loss=0.08498, simple_loss=0.109, pruned_loss=0.02032, audio_tagging_loss=0.01013, over 15888.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.08972, pruned_loss=0.01253, audio_tagging_loss=0.01093, over 2383911.72 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:32:02,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3368673.3333333335, ans=0.0 2023-11-26 11:32:05,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3368673.3333333335, ans=0.1 2023-11-26 11:32:36,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3368873.3333333335, ans=0.125 2023-11-26 11:32:50,424 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505350 2023-11-26 11:32:54,065 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 350, loss[loss=0.0701, simple_loss=0.1065, pruned_loss=0.009224, audio_tagging_loss=0.007637, over 15384.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09053, pruned_loss=0.01237, audio_tagging_loss=0.01026, over 2537100.10 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:33:00,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3369006.6666666665, ans=0.125 2023-11-26 11:33:03,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3369073.3333333335, ans=0.1 2023-11-26 11:33:18,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3369140.0, ans=0.125 2023-11-26 11:33:30,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3369206.6666666665, ans=0.0 2023-11-26 11:33:35,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.731e+01 9.201e+01 9.958e+01 1.413e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 11:33:38,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-11-26 11:33:46,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505400 2023-11-26 11:33:48,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-26 11:33:50,625 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 400, loss[loss=0.06009, simple_loss=0.07599, pruned_loss=0.01078, audio_tagging_loss=0.01131, over 16211.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09067, pruned_loss=0.01243, audio_tagging_loss=0.009789, over 2649482.74 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:34:06,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3369406.6666666665, ans=0.07 2023-11-26 11:34:08,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3369406.6666666665, ans=10.0 2023-11-26 11:34:09,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3369406.6666666665, ans=0.0 2023-11-26 11:34:43,226 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505450 2023-11-26 11:34:46,991 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 450, loss[loss=0.0654, simple_loss=0.08392, pruned_loss=0.01145, audio_tagging_loss=0.01199, over 14013.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09071, pruned_loss=0.01247, audio_tagging_loss=0.009471, over 2738940.12 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:35:03,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3369740.0, ans=0.125 2023-11-26 11:35:06,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3369740.0, ans=0.125 2023-11-26 11:35:16,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3369806.6666666665, ans=0.0 2023-11-26 11:35:27,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.871e+01 9.451e+01 1.026e+02 1.404e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 11:35:36,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3369940.0, ans=0.0 2023-11-26 11:35:39,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505500 2023-11-26 11:35:40,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3369940.0, ans=0.125 2023-11-26 11:35:40,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3369940.0, ans=0.125 2023-11-26 11:35:42,265 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 500, loss[loss=0.06954, simple_loss=0.09853, pruned_loss=0.01259, audio_tagging_loss=0.007681, over 15386.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08964, pruned_loss=0.01213, audio_tagging_loss=0.009309, over 2804945.28 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:35:45,754 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:36:05,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.77 vs. limit=10.0 2023-11-26 11:36:27,720 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2023-11-26 11:36:35,147 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505550 2023-11-26 11:36:38,279 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 550, loss[loss=0.0433, simple_loss=0.05259, pruned_loss=0.006932, audio_tagging_loss=0.01008, over 15122.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08955, pruned_loss=0.0121, audio_tagging_loss=0.009263, over 2866301.63 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:36:38,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3370340.0, ans=0.125 2023-11-26 11:36:44,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2023-11-26 11:36:46,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=10.0 2023-11-26 11:36:52,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3370406.6666666665, ans=0.2 2023-11-26 11:36:54,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370406.6666666665, ans=0.1 2023-11-26 11:37:01,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3370473.3333333335, ans=0.035 2023-11-26 11:37:13,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3370540.0, ans=0.125 2023-11-26 11:37:14,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3370540.0, ans=0.125 2023-11-26 11:37:15,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3370540.0, ans=0.0 2023-11-26 11:37:19,740 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.880e+01 9.459e+01 1.018e+02 1.226e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 11:37:28,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3370606.6666666665, ans=0.2 2023-11-26 11:37:31,038 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505600 2023-11-26 11:37:34,736 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 600, loss[loss=0.07223, simple_loss=0.1054, pruned_loss=0.0123, audio_tagging_loss=0.007217, over 15474.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09039, pruned_loss=0.01218, audio_tagging_loss=0.00915, over 2911238.75 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:37:41,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3370673.3333333335, ans=0.2 2023-11-26 11:37:49,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3370740.0, ans=0.125 2023-11-26 11:38:07,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3370873.3333333335, ans=0.125 2023-11-26 11:38:16,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3370873.3333333335, ans=0.125 2023-11-26 11:38:27,358 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505650 2023-11-26 11:38:30,489 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 650, loss[loss=0.06693, simple_loss=0.08418, pruned_loss=0.01381, audio_tagging_loss=0.01103, over 15121.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08958, pruned_loss=0.01207, audio_tagging_loss=0.009203, over 2939553.49 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:38:32,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3371006.6666666665, ans=0.125 2023-11-26 11:38:32,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3371006.6666666665, ans=0.0 2023-11-26 11:38:32,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3371006.6666666665, ans=0.125 2023-11-26 11:38:34,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3371006.6666666665, ans=0.125 2023-11-26 11:38:55,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3371140.0, ans=0.0 2023-11-26 11:39:12,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.620e+01 9.335e+01 1.001e+02 1.278e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 11:39:22,908 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505700 2023-11-26 11:39:26,075 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 700, loss[loss=0.0597, simple_loss=0.08934, pruned_loss=0.007343, audio_tagging_loss=0.007688, over 15437.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0906, pruned_loss=0.01222, audio_tagging_loss=0.009045, over 2967283.16 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:39:35,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3371340.0, ans=0.2 2023-11-26 11:39:40,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3371406.6666666665, ans=0.0 2023-11-26 11:40:19,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505750 2023-11-26 11:40:22,313 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 750, loss[loss=0.07718, simple_loss=0.1042, pruned_loss=0.01782, audio_tagging_loss=0.007258, over 14931.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09021, pruned_loss=0.01218, audio_tagging_loss=0.009054, over 2986425.97 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:40:26,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3371673.3333333335, ans=0.125 2023-11-26 11:40:44,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3371806.6666666665, ans=0.125 2023-11-26 11:40:56,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3371873.3333333335, ans=0.04949747468305833 2023-11-26 11:41:03,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.569e+01 9.106e+01 9.803e+01 1.361e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-26 11:41:06,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3371940.0, ans=0.125 2023-11-26 11:41:15,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505800 2023-11-26 11:41:19,110 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 800, loss[loss=0.07291, simple_loss=0.1027, pruned_loss=0.01418, audio_tagging_loss=0.00739, over 14811.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09048, pruned_loss=0.01226, audio_tagging_loss=0.009018, over 3003471.00 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:41:24,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-26 11:41:28,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3372073.3333333335, ans=0.0 2023-11-26 11:41:29,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3372073.3333333335, ans=0.125 2023-11-26 11:41:34,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3372073.3333333335, ans=0.0 2023-11-26 11:41:41,416 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.28 vs. limit=10.0 2023-11-26 11:41:44,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3372140.0, ans=0.1 2023-11-26 11:41:45,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3372140.0, ans=0.1 2023-11-26 11:41:53,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3372206.6666666665, ans=0.125 2023-11-26 11:42:11,424 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505850 2023-11-26 11:42:14,550 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 850, loss[loss=0.06688, simple_loss=0.08585, pruned_loss=0.01448, audio_tagging_loss=0.009479, over 15129.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.0909, pruned_loss=0.01242, audio_tagging_loss=0.00905, over 3008993.50 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:42:42,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-26 11:42:45,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3372473.3333333335, ans=0.125 2023-11-26 11:42:47,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3372540.0, ans=0.0 2023-11-26 11:42:54,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3372540.0, ans=0.125 2023-11-26 11:42:56,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.767e+01 9.422e+01 1.006e+02 1.207e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 11:43:00,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3372606.6666666665, ans=0.0 2023-11-26 11:43:07,283 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505900 2023-11-26 11:43:10,999 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 900, loss[loss=0.06521, simple_loss=0.08868, pruned_loss=0.01315, audio_tagging_loss=0.007719, over 15034.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09104, pruned_loss=0.01233, audio_tagging_loss=0.008984, over 3020341.00 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:43:16,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3372673.3333333335, ans=0.0 2023-11-26 11:43:18,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3372673.3333333335, ans=0.05 2023-11-26 11:44:04,585 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505950 2023-11-26 11:44:07,738 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 950, loss[loss=0.05309, simple_loss=0.0671, pruned_loss=0.0108, audio_tagging_loss=0.008746, over 14358.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09058, pruned_loss=0.01206, audio_tagging_loss=0.008987, over 3025600.57 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:44:24,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3373073.3333333335, ans=0.05 2023-11-26 11:44:35,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3373140.0, ans=0.0 2023-11-26 11:44:38,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3373140.0, ans=0.125 2023-11-26 11:44:45,021 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:44:48,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.709e+01 9.307e+01 9.774e+01 1.284e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 11:44:52,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3373273.3333333335, ans=0.125 2023-11-26 11:44:59,657 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506000 2023-11-26 11:45:00,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.56 vs. limit=22.5 2023-11-26 11:45:03,181 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1000, loss[loss=0.06178, simple_loss=0.07399, pruned_loss=0.01293, audio_tagging_loss=0.01185, over 14609.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09069, pruned_loss=0.01216, audio_tagging_loss=0.008819, over 3029938.65 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:45:06,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3373340.0, ans=0.125 2023-11-26 11:45:27,321 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:45:38,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3373540.0, ans=0.0 2023-11-26 11:45:55,487 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506050 2023-11-26 11:45:58,636 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1050, loss[loss=0.06152, simple_loss=0.08428, pruned_loss=0.01137, audio_tagging_loss=0.008015, over 14938.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08872, pruned_loss=0.01212, audio_tagging_loss=0.008838, over 3028388.51 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:46:10,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3373740.0, ans=0.0 2023-11-26 11:46:33,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3373873.3333333335, ans=0.2 2023-11-26 11:46:40,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.798e+01 9.171e+01 9.764e+01 1.249e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-26 11:46:41,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3373873.3333333335, ans=0.0 2023-11-26 11:46:42,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3373940.0, ans=0.0 2023-11-26 11:46:49,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3373940.0, ans=0.125 2023-11-26 11:46:52,180 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506100 2023-11-26 11:46:53,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=22.5 2023-11-26 11:46:55,359 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1100, loss[loss=0.05146, simple_loss=0.07717, pruned_loss=0.006645, audio_tagging_loss=0.00623, over 16273.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08867, pruned_loss=0.01193, audio_tagging_loss=0.008664, over 3042523.74 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:46:56,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3374006.6666666665, ans=0.125 2023-11-26 11:46:57,478 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:47:14,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3374073.3333333335, ans=0.125 2023-11-26 11:47:30,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3374206.6666666665, ans=0.125 2023-11-26 11:47:37,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3374206.6666666665, ans=0.1 2023-11-26 11:47:47,589 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506150 2023-11-26 11:47:48,823 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:47:50,745 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1150, loss[loss=0.07665, simple_loss=0.1108, pruned_loss=0.01255, audio_tagging_loss=0.008698, over 16098.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08923, pruned_loss=0.01201, audio_tagging_loss=0.008638, over 3039791.06 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:48:01,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3374406.6666666665, ans=0.5 2023-11-26 11:48:05,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3374406.6666666665, ans=0.125 2023-11-26 11:48:13,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2023-11-26 11:48:16,543 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-26 11:48:21,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3374473.3333333335, ans=0.0 2023-11-26 11:48:33,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.776e+01 9.671e+01 1.060e+02 1.290e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 11:48:36,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3374606.6666666665, ans=0.0 2023-11-26 11:48:43,889 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506200 2023-11-26 11:48:47,306 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1200, loss[loss=0.06885, simple_loss=0.0957, pruned_loss=0.01301, audio_tagging_loss=0.007994, over 16787.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08958, pruned_loss=0.01204, audio_tagging_loss=0.008548, over 3037739.54 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:48:58,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3374740.0, ans=0.125 2023-11-26 11:49:03,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3374740.0, ans=0.0 2023-11-26 11:49:07,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3374740.0, ans=0.035 2023-11-26 11:49:10,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3374806.6666666665, ans=0.0 2023-11-26 11:49:40,159 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506250 2023-11-26 11:49:43,816 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1250, loss[loss=0.09008, simple_loss=0.1166, pruned_loss=0.02136, audio_tagging_loss=0.01042, over 15151.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09068, pruned_loss=0.0123, audio_tagging_loss=0.008587, over 3037076.23 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:49:49,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3375006.6666666665, ans=0.1 2023-11-26 11:50:01,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-26 11:50:01,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-26 11:50:26,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.620e+01 9.220e+01 9.934e+01 1.296e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 11:50:36,719 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506300 2023-11-26 11:50:39,858 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1300, loss[loss=0.05948, simple_loss=0.07835, pruned_loss=0.01018, audio_tagging_loss=0.01012, over 16667.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09064, pruned_loss=0.01232, audio_tagging_loss=0.008599, over 3040747.18 frames. ], batch size: 65, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:50:57,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3375406.6666666665, ans=0.125 2023-11-26 11:51:07,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3375473.3333333335, ans=0.125 2023-11-26 11:51:15,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3375540.0, ans=0.2 2023-11-26 11:51:16,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3375540.0, ans=0.0 2023-11-26 11:51:32,251 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506350 2023-11-26 11:51:32,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3375606.6666666665, ans=0.125 2023-11-26 11:51:36,020 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1350, loss[loss=0.06972, simple_loss=0.09265, pruned_loss=0.01405, audio_tagging_loss=0.009345, over 14825.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09003, pruned_loss=0.01223, audio_tagging_loss=0.00864, over 3035316.13 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:51:37,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3375673.3333333335, ans=0.5 2023-11-26 11:52:13,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3375873.3333333335, ans=0.125 2023-11-26 11:52:15,839 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:52:19,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.796e+01 9.351e+01 9.969e+01 1.266e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 11:52:21,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3375940.0, ans=0.125 2023-11-26 11:52:25,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3375940.0, ans=0.09899494936611666 2023-11-26 11:52:28,746 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506400 2023-11-26 11:52:30,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2023-11-26 11:52:32,100 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1400, loss[loss=0.06457, simple_loss=0.08412, pruned_loss=0.01286, audio_tagging_loss=0.009651, over 16069.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08959, pruned_loss=0.01227, audio_tagging_loss=0.008733, over 3037392.20 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:52:40,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3376006.6666666665, ans=10.0 2023-11-26 11:52:48,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3376073.3333333335, ans=0.1 2023-11-26 11:52:49,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3376073.3333333335, ans=0.125 2023-11-26 11:52:56,977 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:53:02,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3376140.0, ans=0.05 2023-11-26 11:53:15,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2023-11-26 11:53:25,012 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506450 2023-11-26 11:53:28,647 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1450, loss[loss=0.05081, simple_loss=0.06774, pruned_loss=0.00675, audio_tagging_loss=0.01019, over 15301.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08846, pruned_loss=0.01217, audio_tagging_loss=0.008915, over 3034870.72 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:53:34,652 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.21 vs. limit=10.0 2023-11-26 11:53:44,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3376406.6666666665, ans=0.0 2023-11-26 11:53:49,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3376473.3333333335, ans=0.0 2023-11-26 11:54:12,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.794e+01 9.262e+01 1.013e+02 1.743e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 11:54:15,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3376606.6666666665, ans=0.125 2023-11-26 11:54:19,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3376606.6666666665, ans=0.125 2023-11-26 11:54:20,649 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506500 2023-11-26 11:54:23,738 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1500, loss[loss=0.08203, simple_loss=0.1065, pruned_loss=0.01731, audio_tagging_loss=0.01147, over 15899.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08893, pruned_loss=0.01244, audio_tagging_loss=0.008928, over 3033854.67 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:54:31,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3376673.3333333335, ans=0.125 2023-11-26 11:54:38,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3376740.0, ans=0.2 2023-11-26 11:54:41,766 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-26 11:54:54,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3376806.6666666665, ans=0.125 2023-11-26 11:54:55,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3376806.6666666665, ans=0.2 2023-11-26 11:55:02,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3376873.3333333335, ans=0.2 2023-11-26 11:55:14,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3376940.0, ans=0.1 2023-11-26 11:55:16,595 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506550 2023-11-26 11:55:16,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3376940.0, ans=0.05 2023-11-26 11:55:20,259 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1550, loss[loss=0.05912, simple_loss=0.07493, pruned_loss=0.01323, audio_tagging_loss=0.008425, over 14957.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0903, pruned_loss=0.01275, audio_tagging_loss=0.008941, over 3031762.75 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 4.0 2023-11-26 11:55:30,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2023-11-26 11:55:38,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.65 vs. limit=15.0 2023-11-26 11:56:01,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3377206.6666666665, ans=0.0 2023-11-26 11:56:06,472 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.914e+01 9.617e+01 1.050e+02 1.576e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 11:56:13,028 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506600 2023-11-26 11:56:16,424 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1600, loss[loss=0.06721, simple_loss=0.08728, pruned_loss=0.01341, audio_tagging_loss=0.01016, over 14453.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09004, pruned_loss=0.01257, audio_tagging_loss=0.008968, over 3041420.87 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:56:27,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3377406.6666666665, ans=0.125 2023-11-26 11:56:28,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3377406.6666666665, ans=0.125 2023-11-26 11:56:38,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.17 vs. limit=12.0 2023-11-26 11:56:58,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.14 vs. limit=10.0 2023-11-26 11:57:07,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-26 11:57:09,292 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506650 2023-11-26 11:57:11,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2023-11-26 11:57:12,407 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1650, loss[loss=0.06, simple_loss=0.0891, pruned_loss=0.007067, audio_tagging_loss=0.008381, over 15310.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09086, pruned_loss=0.01259, audio_tagging_loss=0.008972, over 3044712.59 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:57:37,956 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:57:44,399 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:57:55,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-26 11:57:58,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3377940.0, ans=0.0 2023-11-26 11:57:58,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.844e+01 9.530e+01 1.032e+02 1.539e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 11:58:05,851 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506700 2023-11-26 11:58:09,010 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1700, loss[loss=0.05095, simple_loss=0.07069, pruned_loss=0.006125, audio_tagging_loss=0.009485, over 14755.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09096, pruned_loss=0.01252, audio_tagging_loss=0.00894, over 3041578.36 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:58:16,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3378006.6666666665, ans=0.0 2023-11-26 11:58:24,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3378073.3333333335, ans=0.125 2023-11-26 11:58:29,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2023-11-26 11:58:54,720 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=15.0 2023-11-26 11:58:59,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3378273.3333333335, ans=0.1 2023-11-26 11:59:02,212 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506750 2023-11-26 11:59:03,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3378273.3333333335, ans=0.07 2023-11-26 11:59:05,376 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1750, loss[loss=0.05818, simple_loss=0.08046, pruned_loss=0.008711, audio_tagging_loss=0.009244, over 16016.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09065, pruned_loss=0.01239, audio_tagging_loss=0.008854, over 3042946.99 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:59:18,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2023-11-26 11:59:26,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3378473.3333333335, ans=0.0 2023-11-26 11:59:37,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3378473.3333333335, ans=0.125 2023-11-26 11:59:38,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3378540.0, ans=0.1 2023-11-26 11:59:45,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3378540.0, ans=0.2 2023-11-26 11:59:51,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.598e+01 9.201e+01 1.005e+02 1.270e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 11:59:57,756 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506800 2023-11-26 12:00:01,108 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1800, loss[loss=0.0689, simple_loss=0.09561, pruned_loss=0.01224, audio_tagging_loss=0.008858, over 15438.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09117, pruned_loss=0.01255, audio_tagging_loss=0.008703, over 3044192.95 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:00:06,415 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2023-11-26 12:00:12,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3378740.0, ans=0.0 2023-11-26 12:00:13,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3378740.0, ans=0.07 2023-11-26 12:00:20,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3378740.0, ans=0.125 2023-11-26 12:00:25,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3378806.6666666665, ans=0.0 2023-11-26 12:00:39,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2023-11-26 12:00:54,403 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506850 2023-11-26 12:00:57,541 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1850, loss[loss=0.05938, simple_loss=0.07673, pruned_loss=0.01198, audio_tagging_loss=0.009038, over 15416.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09107, pruned_loss=0.01261, audio_tagging_loss=0.008626, over 3042987.91 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:01:06,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2023-11-26 12:01:13,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3379073.3333333335, ans=0.125 2023-11-26 12:01:25,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3379140.0, ans=0.125 2023-11-26 12:01:33,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3379206.6666666665, ans=0.1 2023-11-26 12:01:43,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.799e+01 9.499e+01 1.025e+02 1.305e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 12:01:48,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2023-11-26 12:01:50,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506900 2023-11-26 12:01:53,961 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1900, loss[loss=0.0666, simple_loss=0.09695, pruned_loss=0.009978, audio_tagging_loss=0.008148, over 16173.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0904, pruned_loss=0.01252, audio_tagging_loss=0.00866, over 3050246.68 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:02:20,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3379473.3333333335, ans=10.0 2023-11-26 12:02:23,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3379473.3333333335, ans=0.125 2023-11-26 12:02:33,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3379540.0, ans=0.0 2023-11-26 12:02:42,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3379606.6666666665, ans=0.1 2023-11-26 12:02:46,118 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506950 2023-11-26 12:02:49,330 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1950, loss[loss=0.07222, simple_loss=0.09451, pruned_loss=0.01508, audio_tagging_loss=0.009889, over 15542.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09024, pruned_loss=0.01259, audio_tagging_loss=0.008622, over 3046650.80 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:02:55,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3379673.3333333335, ans=0.125 2023-11-26 12:03:35,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-26 12:03:35,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.710e+01 9.475e+01 1.012e+02 2.962e+02, threshold=1.895e+02, percent-clipped=1.0 2023-11-26 12:03:42,555 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507000 2023-11-26 12:03:45,962 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2000, loss[loss=0.08674, simple_loss=0.1167, pruned_loss=0.01905, audio_tagging_loss=0.009314, over 14755.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09019, pruned_loss=0.01252, audio_tagging_loss=0.00866, over 3038080.83 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:03:49,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3380006.6666666665, ans=0.125 2023-11-26 12:04:00,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3380073.3333333335, ans=10.0 2023-11-26 12:04:11,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3380140.0, ans=0.125 2023-11-26 12:04:32,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3380273.3333333335, ans=0.0 2023-11-26 12:04:39,518 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507050 2023-11-26 12:04:42,692 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2050, loss[loss=0.0446, simple_loss=0.05756, pruned_loss=0.003904, audio_tagging_loss=0.01192, over 15205.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09032, pruned_loss=0.01256, audio_tagging_loss=0.008677, over 3040922.64 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:05:14,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3380540.0, ans=0.0 2023-11-26 12:05:28,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.61 vs. limit=15.0 2023-11-26 12:05:28,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.859e+01 9.273e+01 1.013e+02 1.302e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 12:05:34,984 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507100 2023-11-26 12:05:38,117 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2100, loss[loss=0.06577, simple_loss=0.08736, pruned_loss=0.01187, audio_tagging_loss=0.01022, over 14760.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09047, pruned_loss=0.01262, audio_tagging_loss=0.008634, over 3040444.44 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:05:43,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-11-26 12:05:59,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2023-11-26 12:06:12,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=22.5 2023-11-26 12:06:20,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-26 12:06:20,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3380873.3333333335, ans=0.0 2023-11-26 12:06:30,270 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507150 2023-11-26 12:06:30,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3380940.0, ans=0.125 2023-11-26 12:06:33,900 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2150, loss[loss=0.06609, simple_loss=0.08533, pruned_loss=0.0144, audio_tagging_loss=0.009021, over 15815.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09061, pruned_loss=0.0126, audio_tagging_loss=0.008683, over 3047141.15 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:06:59,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3381140.0, ans=0.1 2023-11-26 12:07:01,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.55 vs. limit=22.5 2023-11-26 12:07:04,000 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.26 vs. limit=15.0 2023-11-26 12:07:07,563 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:07:16,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.52 vs. limit=22.5 2023-11-26 12:07:19,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.634e+01 9.242e+01 1.004e+02 1.355e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 12:07:26,696 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507200 2023-11-26 12:07:30,656 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2200, loss[loss=0.06427, simple_loss=0.08555, pruned_loss=0.01289, audio_tagging_loss=0.008609, over 15934.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09033, pruned_loss=0.0125, audio_tagging_loss=0.008716, over 3038476.82 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:07:41,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3381406.6666666665, ans=0.125 2023-11-26 12:07:46,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-26 12:07:53,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3381473.3333333335, ans=0.125 2023-11-26 12:07:55,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3381473.3333333335, ans=0.125 2023-11-26 12:07:56,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3381473.3333333335, ans=0.0 2023-11-26 12:08:06,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3381540.0, ans=0.125 2023-11-26 12:08:09,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3381540.0, ans=0.0 2023-11-26 12:08:16,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2023-11-26 12:08:18,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3381606.6666666665, ans=0.2 2023-11-26 12:08:23,031 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507250 2023-11-26 12:08:26,131 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2250, loss[loss=0.06221, simple_loss=0.08353, pruned_loss=0.01063, audio_tagging_loss=0.009815, over 16067.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09104, pruned_loss=0.01254, audio_tagging_loss=0.008661, over 3040151.50 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:08:29,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3381673.3333333335, ans=0.125 2023-11-26 12:08:29,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-11-26 12:08:41,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3381740.0, ans=0.125 2023-11-26 12:08:50,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3381806.6666666665, ans=0.125 2023-11-26 12:08:54,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3381806.6666666665, ans=0.1 2023-11-26 12:09:06,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3381873.3333333335, ans=0.125 2023-11-26 12:09:11,481 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.710e+01 9.301e+01 1.010e+02 1.448e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 12:09:14,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3381940.0, ans=0.125 2023-11-26 12:09:17,940 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507300 2023-11-26 12:09:21,645 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2300, loss[loss=0.06972, simple_loss=0.09933, pruned_loss=0.01391, audio_tagging_loss=0.006142, over 15496.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09143, pruned_loss=0.01253, audio_tagging_loss=0.008665, over 3046552.77 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:09:25,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2023-11-26 12:09:49,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3382140.0, ans=0.125 2023-11-26 12:09:50,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3382140.0, ans=0.2 2023-11-26 12:10:10,769 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:10:11,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.94 vs. limit=10.0 2023-11-26 12:10:14,611 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507350 2023-11-26 12:10:17,688 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2350, loss[loss=0.05846, simple_loss=0.08333, pruned_loss=0.008355, audio_tagging_loss=0.008439, over 15446.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09124, pruned_loss=0.01255, audio_tagging_loss=0.008721, over 3051745.84 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:10:23,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2023-11-26 12:10:31,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3382406.6666666665, ans=0.0 2023-11-26 12:10:39,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3382473.3333333335, ans=0.125 2023-11-26 12:11:02,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3382606.6666666665, ans=0.125 2023-11-26 12:11:04,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.890e+01 9.561e+01 1.021e+02 1.290e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 12:11:11,214 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507400 2023-11-26 12:11:14,680 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2400, loss[loss=0.07423, simple_loss=0.1041, pruned_loss=0.01462, audio_tagging_loss=0.007557, over 15808.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09104, pruned_loss=0.0126, audio_tagging_loss=0.008851, over 3055367.78 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 12:12:04,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3382940.0, ans=0.1 2023-11-26 12:12:06,706 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507450 2023-11-26 12:12:09,768 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2450, loss[loss=0.06259, simple_loss=0.08116, pruned_loss=0.01268, audio_tagging_loss=0.009332, over 15156.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09168, pruned_loss=0.01268, audio_tagging_loss=0.008871, over 3051781.47 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:12:35,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3383140.0, ans=0.125 2023-11-26 12:12:35,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-26 12:12:48,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3383206.6666666665, ans=0.0 2023-11-26 12:12:50,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3383206.6666666665, ans=0.0 2023-11-26 12:12:57,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.728e+01 9.307e+01 9.934e+01 1.225e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 12:13:02,385 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507500 2023-11-26 12:13:06,028 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2500, loss[loss=0.07852, simple_loss=0.1101, pruned_loss=0.01418, audio_tagging_loss=0.009284, over 15824.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09101, pruned_loss=0.01253, audio_tagging_loss=0.008971, over 3049797.40 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:13:13,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3383340.0, ans=0.1 2023-11-26 12:13:33,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3383473.3333333335, ans=0.125 2023-11-26 12:13:54,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3383606.6666666665, ans=0.125 2023-11-26 12:13:59,021 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507550 2023-11-26 12:13:59,764 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.78 vs. limit=10.0 2023-11-26 12:14:02,169 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2550, loss[loss=0.06697, simple_loss=0.08895, pruned_loss=0.01107, audio_tagging_loss=0.01143, over 15774.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09038, pruned_loss=0.01246, audio_tagging_loss=0.008894, over 3046330.91 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:14:19,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3383740.0, ans=0.125 2023-11-26 12:14:22,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.96 vs. limit=10.0 2023-11-26 12:14:22,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3383740.0, ans=0.0 2023-11-26 12:14:24,932 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:14:31,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3383806.6666666665, ans=0.125 2023-11-26 12:14:48,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383940.0, ans=0.1 2023-11-26 12:14:49,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.661e+01 9.276e+01 1.004e+02 1.739e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 12:14:54,983 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507600 2023-11-26 12:14:56,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-26 12:14:57,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3384006.6666666665, ans=0.125 2023-11-26 12:14:58,339 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2600, loss[loss=0.06672, simple_loss=0.0912, pruned_loss=0.01408, audio_tagging_loss=0.007033, over 15197.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08956, pruned_loss=0.01238, audio_tagging_loss=0.008811, over 3042943.71 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:15:07,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384006.6666666665, ans=0.1 2023-11-26 12:15:29,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3384140.0, ans=0.125 2023-11-26 12:15:32,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3384206.6666666665, ans=0.2 2023-11-26 12:15:36,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3384206.6666666665, ans=0.125 2023-11-26 12:15:51,093 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507650 2023-11-26 12:15:54,228 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2650, loss[loss=0.05198, simple_loss=0.06732, pruned_loss=0.008913, audio_tagging_loss=0.009403, over 15511.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08964, pruned_loss=0.01227, audio_tagging_loss=0.008726, over 3051176.60 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:16:15,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384473.3333333335, ans=0.1 2023-11-26 12:16:24,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-26 12:16:42,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.705e+01 9.342e+01 1.013e+02 1.276e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 12:16:47,302 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507700 2023-11-26 12:16:50,470 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2700, loss[loss=0.06892, simple_loss=0.09178, pruned_loss=0.0133, audio_tagging_loss=0.009734, over 15205.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08932, pruned_loss=0.01232, audio_tagging_loss=0.008702, over 3045916.45 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:17:06,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3384740.0, ans=0.0 2023-11-26 12:17:22,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.29 vs. limit=10.0 2023-11-26 12:17:27,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3384873.3333333335, ans=0.0 2023-11-26 12:17:31,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3384873.3333333335, ans=0.125 2023-11-26 12:17:42,634 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507750 2023-11-26 12:17:45,890 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2750, loss[loss=0.08052, simple_loss=0.1077, pruned_loss=0.01882, audio_tagging_loss=0.007846, over 13660.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0896, pruned_loss=0.0125, audio_tagging_loss=0.008739, over 3051860.84 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:17:57,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3385073.3333333335, ans=0.2 2023-11-26 12:18:26,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3385206.6666666665, ans=0.09899494936611666 2023-11-26 12:18:34,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.797e+01 9.310e+01 1.006e+02 1.204e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 12:18:36,019 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:18:39,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507800 2023-11-26 12:18:42,661 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2800, loss[loss=0.07198, simple_loss=0.09863, pruned_loss=0.01375, audio_tagging_loss=0.008915, over 14619.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08861, pruned_loss=0.01223, audio_tagging_loss=0.008774, over 3046240.97 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:18:46,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3385340.0, ans=0.125 2023-11-26 12:18:47,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3385340.0, ans=0.125 2023-11-26 12:18:54,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3385406.6666666665, ans=0.125 2023-11-26 12:18:59,566 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.70 vs. limit=10.0 2023-11-26 12:19:04,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3385473.3333333335, ans=0.125 2023-11-26 12:19:36,156 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507850 2023-11-26 12:19:39,344 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2850, loss[loss=0.06437, simple_loss=0.09019, pruned_loss=0.01056, audio_tagging_loss=0.008721, over 15423.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08873, pruned_loss=0.01233, audio_tagging_loss=0.008743, over 3040894.75 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:19:43,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3385673.3333333335, ans=0.0 2023-11-26 12:19:46,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3385673.3333333335, ans=0.125 2023-11-26 12:20:02,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3385806.6666666665, ans=0.0 2023-11-26 12:20:06,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3385806.6666666665, ans=0.0 2023-11-26 12:20:18,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2023-11-26 12:20:28,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.713e+01 9.303e+01 9.917e+01 1.324e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 12:20:28,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3385940.0, ans=0.0 2023-11-26 12:20:31,669 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507900 2023-11-26 12:20:34,805 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2900, loss[loss=0.04018, simple_loss=0.05632, pruned_loss=0.004564, audio_tagging_loss=0.007455, over 14804.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08854, pruned_loss=0.01239, audio_tagging_loss=0.008729, over 3037898.75 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:20:37,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3386006.6666666665, ans=0.0 2023-11-26 12:20:48,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2023-11-26 12:20:51,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3386073.3333333335, ans=0.125 2023-11-26 12:20:56,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2023-11-26 12:20:58,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2023-11-26 12:21:08,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3386206.6666666665, ans=0.0 2023-11-26 12:21:08,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3386206.6666666665, ans=0.125 2023-11-26 12:21:09,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3386206.6666666665, ans=0.0 2023-11-26 12:21:10,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386206.6666666665, ans=0.1 2023-11-26 12:21:27,888 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507950 2023-11-26 12:21:31,523 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2950, loss[loss=0.08318, simple_loss=0.1163, pruned_loss=0.01784, audio_tagging_loss=0.007178, over 15312.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08912, pruned_loss=0.01239, audio_tagging_loss=0.008713, over 3041367.53 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:21:46,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=22.5 2023-11-26 12:21:52,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-26 12:21:55,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3386473.3333333335, ans=0.125 2023-11-26 12:22:10,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3386540.0, ans=0.125 2023-11-26 12:22:20,945 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.672e+01 9.532e+01 9.988e+01 1.402e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 12:22:24,241 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508000 2023-11-26 12:22:24,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3386606.6666666665, ans=0.2 2023-11-26 12:22:30,157 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3000, loss[loss=0.06407, simple_loss=0.085, pruned_loss=0.01274, audio_tagging_loss=0.008825, over 14944.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09003, pruned_loss=0.01248, audio_tagging_loss=0.008784, over 3038634.61 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:22:30,158 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 12:23:02,721 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.05754, simple_loss=0.05056, pruned_loss=0.00524, audio_tagging_loss=0.02702, over 4681554.00 frames. 2023-11-26 12:23:02,721 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 12:23:14,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3386740.0, ans=0.0 2023-11-26 12:23:32,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3386806.6666666665, ans=0.1 2023-11-26 12:23:32,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3386806.6666666665, ans=0.125 2023-11-26 12:23:51,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.11 vs. limit=15.0 2023-11-26 12:23:55,263 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508050 2023-11-26 12:23:56,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3386940.0, ans=0.2 2023-11-26 12:23:58,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3387006.6666666665, ans=0.125 2023-11-26 12:23:58,983 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3050, loss[loss=0.05226, simple_loss=0.06851, pruned_loss=0.007289, audio_tagging_loss=0.01071, over 14643.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09011, pruned_loss=0.01246, audio_tagging_loss=0.008837, over 3044024.06 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:24:32,790 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:24:48,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.074e+01 8.532e+01 9.331e+01 1.001e+02 1.251e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 12:24:52,529 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508100 2023-11-26 12:24:53,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3387273.3333333335, ans=0.0 2023-11-26 12:24:55,662 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3100, loss[loss=0.07739, simple_loss=0.09656, pruned_loss=0.01966, audio_tagging_loss=0.009444, over 15221.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09059, pruned_loss=0.01241, audio_tagging_loss=0.00885, over 3048862.80 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:25:16,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3387473.3333333335, ans=0.0 2023-11-26 12:25:29,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3387540.0, ans=0.0 2023-11-26 12:25:32,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3387540.0, ans=0.125 2023-11-26 12:25:46,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3387606.6666666665, ans=0.2 2023-11-26 12:25:47,452 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508150 2023-11-26 12:25:50,632 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3150, loss[loss=0.07666, simple_loss=0.09964, pruned_loss=0.01906, audio_tagging_loss=0.007784, over 14895.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09087, pruned_loss=0.01259, audio_tagging_loss=0.008874, over 3043849.67 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:25:50,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3387673.3333333335, ans=0.0 2023-11-26 12:26:06,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3387740.0, ans=0.1 2023-11-26 12:26:15,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3387806.6666666665, ans=0.125 2023-11-26 12:26:22,086 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.542e-03 2023-11-26 12:26:25,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-26 12:26:34,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2023-11-26 12:26:37,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3387940.0, ans=0.0 2023-11-26 12:26:38,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3387940.0, ans=0.125 2023-11-26 12:26:39,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.956e+01 9.437e+01 1.017e+02 1.314e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 12:26:43,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508200 2023-11-26 12:26:44,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2023-11-26 12:26:47,021 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3200, loss[loss=0.07499, simple_loss=0.09167, pruned_loss=0.01748, audio_tagging_loss=0.01168, over 14626.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09085, pruned_loss=0.01263, audio_tagging_loss=0.009057, over 3041463.76 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:26:48,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3388006.6666666665, ans=0.2 2023-11-26 12:27:03,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2023-11-26 12:27:05,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3388073.3333333335, ans=0.125 2023-11-26 12:27:05,313 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-26 12:27:18,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388140.0, ans=0.1 2023-11-26 12:27:40,474 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508250 2023-11-26 12:27:44,162 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3250, loss[loss=0.06943, simple_loss=0.0903, pruned_loss=0.01538, audio_tagging_loss=0.008897, over 16407.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09111, pruned_loss=0.01262, audio_tagging_loss=0.009015, over 3044180.12 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:27:51,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3388340.0, ans=0.05 2023-11-26 12:28:20,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3388540.0, ans=0.1 2023-11-26 12:28:24,166 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:28:33,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.911e+01 9.477e+01 1.008e+02 1.285e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 12:28:34,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3388606.6666666665, ans=0.125 2023-11-26 12:28:36,978 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508300 2023-11-26 12:28:39,582 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-26 12:28:40,128 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3300, loss[loss=0.05207, simple_loss=0.06942, pruned_loss=0.009908, audio_tagging_loss=0.007457, over 16061.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09054, pruned_loss=0.0125, audio_tagging_loss=0.009065, over 3053368.98 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:28:55,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3388740.0, ans=0.125 2023-11-26 12:29:03,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3388806.6666666665, ans=0.0 2023-11-26 12:29:29,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3388940.0, ans=0.125 2023-11-26 12:29:32,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.51 vs. limit=10.0 2023-11-26 12:29:32,638 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508350 2023-11-26 12:29:35,765 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3350, loss[loss=0.07594, simple_loss=0.1009, pruned_loss=0.01686, audio_tagging_loss=0.008622, over 14980.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09041, pruned_loss=0.0125, audio_tagging_loss=0.009082, over 3049687.96 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:29:40,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=22.5 2023-11-26 12:29:46,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3389073.3333333335, ans=0.2 2023-11-26 12:29:47,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2023-11-26 12:29:55,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3389073.3333333335, ans=0.125 2023-11-26 12:30:15,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3389206.6666666665, ans=0.125 2023-11-26 12:30:17,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-26 12:30:25,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.755e+01 9.551e+01 1.033e+02 1.237e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 12:30:29,029 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508400 2023-11-26 12:30:32,999 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3400, loss[loss=0.06504, simple_loss=0.09331, pruned_loss=0.01236, audio_tagging_loss=0.006028, over 15433.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09068, pruned_loss=0.01261, audio_tagging_loss=0.008915, over 3046593.99 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:30:40,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3389340.0, ans=0.125 2023-11-26 12:30:41,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3389340.0, ans=0.125 2023-11-26 12:30:43,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3389406.6666666665, ans=0.125 2023-11-26 12:31:20,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3389606.6666666665, ans=0.125 2023-11-26 12:31:25,708 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508450 2023-11-26 12:31:28,825 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3450, loss[loss=0.06064, simple_loss=0.08084, pruned_loss=0.009578, audio_tagging_loss=0.01064, over 14470.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0902, pruned_loss=0.01241, audio_tagging_loss=0.008828, over 3045576.06 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:31:32,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-26 12:31:58,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3389806.6666666665, ans=0.0 2023-11-26 12:32:11,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-26 12:32:16,560 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-26 12:32:17,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3389940.0, ans=0.0 2023-11-26 12:32:18,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.804e+01 9.534e+01 1.051e+02 1.288e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 12:32:21,346 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508500 2023-11-26 12:32:25,071 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3500, loss[loss=0.1077, simple_loss=0.1497, pruned_loss=0.02644, audio_tagging_loss=0.006351, over 14980.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09064, pruned_loss=0.01255, audio_tagging_loss=0.008734, over 3049119.19 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:32:27,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3390006.6666666665, ans=0.125 2023-11-26 12:32:33,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3390006.6666666665, ans=0.0 2023-11-26 12:32:35,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3390073.3333333335, ans=0.1 2023-11-26 12:32:35,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3390073.3333333335, ans=0.2 2023-11-26 12:32:49,421 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:32:55,555 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:32:56,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3390140.0, ans=0.025 2023-11-26 12:32:59,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3390206.6666666665, ans=0.125 2023-11-26 12:33:12,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3390273.3333333335, ans=0.125 2023-11-26 12:33:17,920 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508550 2023-11-26 12:33:18,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3390273.3333333335, ans=0.125 2023-11-26 12:33:21,109 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3550, loss[loss=0.07369, simple_loss=0.1, pruned_loss=0.01575, audio_tagging_loss=0.007927, over 16432.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09031, pruned_loss=0.01253, audio_tagging_loss=0.008773, over 3050115.48 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:33:37,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.71 vs. limit=10.0 2023-11-26 12:33:48,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3390473.3333333335, ans=0.1 2023-11-26 12:34:06,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3390606.6666666665, ans=0.125 2023-11-26 12:34:10,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.734e+01 9.266e+01 9.991e+01 1.201e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 12:34:14,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508600 2023-11-26 12:34:18,192 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3600, loss[loss=0.06479, simple_loss=0.0835, pruned_loss=0.01355, audio_tagging_loss=0.009494, over 14866.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08988, pruned_loss=0.01239, audio_tagging_loss=0.008756, over 3045154.56 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:34:25,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3390673.3333333335, ans=0.125 2023-11-26 12:34:44,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3390806.6666666665, ans=0.2 2023-11-26 12:34:47,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3390806.6666666665, ans=0.2 2023-11-26 12:34:57,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3390873.3333333335, ans=0.0 2023-11-26 12:35:10,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508650 2023-11-26 12:35:12,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3391006.6666666665, ans=0.0 2023-11-26 12:35:13,507 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3650, loss[loss=0.0462, simple_loss=0.0658, pruned_loss=0.004377, audio_tagging_loss=0.008929, over 14797.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08938, pruned_loss=0.01231, audio_tagging_loss=0.008752, over 3043418.91 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:35:23,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391073.3333333335, ans=0.1 2023-11-26 12:35:40,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3391140.0, ans=0.125 2023-11-26 12:35:43,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3391140.0, ans=0.1 2023-11-26 12:35:51,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3391206.6666666665, ans=0.125 2023-11-26 12:36:03,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.759e+01 9.340e+01 1.006e+02 1.350e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 12:36:06,542 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508700 2023-11-26 12:36:10,109 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3700, loss[loss=0.06199, simple_loss=0.08132, pruned_loss=0.01149, audio_tagging_loss=0.009834, over 16107.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09017, pruned_loss=0.01243, audio_tagging_loss=0.008704, over 3048784.44 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:36:20,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2023-11-26 12:36:37,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3391473.3333333335, ans=0.1 2023-11-26 12:36:45,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3391540.0, ans=0.07 2023-11-26 12:36:48,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3391540.0, ans=0.0 2023-11-26 12:36:59,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3391606.6666666665, ans=0.0 2023-11-26 12:36:59,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3391606.6666666665, ans=0.125 2023-11-26 12:37:02,929 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508750 2023-11-26 12:37:06,112 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3750, loss[loss=0.08059, simple_loss=0.1215, pruned_loss=0.01262, audio_tagging_loss=0.007233, over 14709.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09017, pruned_loss=0.01242, audio_tagging_loss=0.008712, over 3059331.57 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:37:21,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3391740.0, ans=0.2 2023-11-26 12:37:34,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3391806.6666666665, ans=0.2 2023-11-26 12:37:35,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-26 12:37:36,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-11-26 12:37:46,463 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:37:53,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3391940.0, ans=0.2 2023-11-26 12:37:54,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.936e+01 9.506e+01 1.051e+02 1.254e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 12:37:58,566 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508800 2023-11-26 12:38:01,936 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3800, loss[loss=0.05743, simple_loss=0.07939, pruned_loss=0.008428, audio_tagging_loss=0.009306, over 13865.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08986, pruned_loss=0.01232, audio_tagging_loss=0.008786, over 3055606.98 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:38:04,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3392006.6666666665, ans=0.2 2023-11-26 12:38:14,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3392073.3333333335, ans=0.125 2023-11-26 12:38:54,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508850 2023-11-26 12:38:57,865 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3850, loss[loss=0.06319, simple_loss=0.08395, pruned_loss=0.01392, audio_tagging_loss=0.007294, over 14748.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09016, pruned_loss=0.01249, audio_tagging_loss=0.008836, over 3057186.45 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:39:02,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3392340.0, ans=0.1 2023-11-26 12:39:11,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3392406.6666666665, ans=0.1 2023-11-26 12:39:36,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.31 vs. limit=10.0 2023-11-26 12:39:41,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-26 12:39:43,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3392606.6666666665, ans=0.1 2023-11-26 12:39:49,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.737e+01 9.345e+01 1.016e+02 1.351e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 12:39:51,319 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508900 2023-11-26 12:39:54,434 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3900, loss[loss=0.06614, simple_loss=0.0895, pruned_loss=0.01226, audio_tagging_loss=0.009129, over 14256.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09012, pruned_loss=0.01255, audio_tagging_loss=0.00882, over 3052393.60 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:40:35,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3392873.3333333335, ans=0.125 2023-11-26 12:40:37,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3392873.3333333335, ans=0.125 2023-11-26 12:40:42,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3392940.0, ans=0.2 2023-11-26 12:40:45,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3392940.0, ans=0.125 2023-11-26 12:40:46,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508950 2023-11-26 12:40:47,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3392940.0, ans=0.125 2023-11-26 12:40:49,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3393006.6666666665, ans=0.125 2023-11-26 12:40:49,981 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3950, loss[loss=0.06405, simple_loss=0.08315, pruned_loss=0.0132, audio_tagging_loss=0.009284, over 15693.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0905, pruned_loss=0.0125, audio_tagging_loss=0.008882, over 3055596.78 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:41:10,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3393073.3333333335, ans=0.07 2023-11-26 12:41:29,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3393206.6666666665, ans=0.0 2023-11-26 12:41:37,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2023-11-26 12:41:40,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 8.928e+01 9.625e+01 1.027e+02 1.240e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 12:41:43,139 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509000 2023-11-26 12:41:46,551 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4000, loss[loss=0.0563, simple_loss=0.07329, pruned_loss=0.01079, audio_tagging_loss=0.008865, over 15087.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09044, pruned_loss=0.01253, audio_tagging_loss=0.008965, over 3044729.70 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:41:47,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3393340.0, ans=0.125 2023-11-26 12:42:10,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3393473.3333333335, ans=0.0 2023-11-26 12:42:18,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3393540.0, ans=0.2 2023-11-26 12:42:23,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3393540.0, ans=0.0 2023-11-26 12:42:39,924 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509050 2023-11-26 12:42:43,136 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4050, loss[loss=0.06488, simple_loss=0.09329, pruned_loss=0.01014, audio_tagging_loss=0.008095, over 15376.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09092, pruned_loss=0.01254, audio_tagging_loss=0.008987, over 3038053.59 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:42:47,336 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:42:48,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-11-26 12:43:02,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3393740.0, ans=0.125 2023-11-26 12:43:13,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3393806.6666666665, ans=0.125 2023-11-26 12:43:23,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3393873.3333333335, ans=0.0 2023-11-26 12:43:33,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.904e+01 9.389e+01 9.930e+01 1.705e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 12:43:35,698 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509100 2023-11-26 12:43:38,787 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4100, loss[loss=0.06384, simple_loss=0.08499, pruned_loss=0.0113, audio_tagging_loss=0.01005, over 15277.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.0906, pruned_loss=0.01231, audio_tagging_loss=0.009, over 3038375.95 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:43:50,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=22.5 2023-11-26 12:43:58,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3394073.3333333335, ans=0.1 2023-11-26 12:44:09,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=15.0 2023-11-26 12:44:12,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-11-26 12:44:19,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=12.0 2023-11-26 12:44:30,924 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509150 2023-11-26 12:44:34,589 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4150, loss[loss=0.05338, simple_loss=0.07053, pruned_loss=0.007932, audio_tagging_loss=0.01019, over 14697.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09022, pruned_loss=0.01222, audio_tagging_loss=0.008914, over 3039098.67 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:45:15,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3394540.0, ans=0.0 2023-11-26 12:45:17,400 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:45:25,396 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.865e+01 9.444e+01 1.016e+02 1.308e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-26 12:45:27,620 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509200 2023-11-26 12:45:31,515 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4200, loss[loss=0.07494, simple_loss=0.1057, pruned_loss=0.0129, audio_tagging_loss=0.009186, over 15886.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08918, pruned_loss=0.01203, audio_tagging_loss=0.008835, over 3040993.85 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:45:36,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3394673.3333333335, ans=0.125 2023-11-26 12:45:42,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3394740.0, ans=0.0 2023-11-26 12:45:43,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.11 vs. limit=15.0 2023-11-26 12:45:58,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3394806.6666666665, ans=0.0 2023-11-26 12:45:58,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3394806.6666666665, ans=0.0 2023-11-26 12:46:05,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3394873.3333333335, ans=0.0 2023-11-26 12:46:19,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-26 12:46:24,009 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509250 2023-11-26 12:46:27,117 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4250, loss[loss=0.06028, simple_loss=0.08133, pruned_loss=0.009941, audio_tagging_loss=0.009676, over 15702.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08907, pruned_loss=0.012, audio_tagging_loss=0.008703, over 3044392.75 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:46:31,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3395006.6666666665, ans=0.0 2023-11-26 12:46:39,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3395073.3333333335, ans=0.125 2023-11-26 12:46:43,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-26 12:46:46,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3395073.3333333335, ans=0.125 2023-11-26 12:46:47,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3395073.3333333335, ans=0.125 2023-11-26 12:47:11,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3395273.3333333335, ans=0.1 2023-11-26 12:47:16,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3395273.3333333335, ans=0.125 2023-11-26 12:47:18,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.812e+01 8.820e+01 9.502e+01 1.020e+02 1.301e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 12:47:19,172 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509300 2023-11-26 12:47:22,844 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4300, loss[loss=0.0586, simple_loss=0.07591, pruned_loss=0.01152, audio_tagging_loss=0.009121, over 14644.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08957, pruned_loss=0.01206, audio_tagging_loss=0.008684, over 3043224.66 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:47:24,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3395340.0, ans=0.125 2023-11-26 12:47:26,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3395340.0, ans=0.025 2023-11-26 12:47:33,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-26 12:47:33,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-26 12:47:41,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3395406.6666666665, ans=0.0 2023-11-26 12:48:03,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2023-11-26 12:48:05,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3395540.0, ans=0.2 2023-11-26 12:48:09,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3395606.6666666665, ans=0.0 2023-11-26 12:48:12,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-11-26 12:48:16,093 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509350 2023-11-26 12:48:19,149 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4350, loss[loss=0.06512, simple_loss=0.09081, pruned_loss=0.01234, audio_tagging_loss=0.007385, over 16229.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08976, pruned_loss=0.01215, audio_tagging_loss=0.008634, over 3043316.98 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:48:21,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2023-11-26 12:48:34,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3395740.0, ans=0.1 2023-11-26 12:48:46,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3395806.6666666665, ans=0.125 2023-11-26 12:49:10,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 9.017e+01 9.685e+01 1.042e+02 1.339e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-26 12:49:11,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3395940.0, ans=0.0 2023-11-26 12:49:11,942 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509400 2023-11-26 12:49:15,347 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4400, loss[loss=0.0867, simple_loss=0.1126, pruned_loss=0.02163, audio_tagging_loss=0.008798, over 15031.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09018, pruned_loss=0.01226, audio_tagging_loss=0.008638, over 3048368.48 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:49:27,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3396073.3333333335, ans=0.125 2023-11-26 12:50:02,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3396273.3333333335, ans=0.2 2023-11-26 12:50:07,542 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509450 2023-11-26 12:50:10,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2023-11-26 12:50:10,718 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4450, loss[loss=0.05411, simple_loss=0.06784, pruned_loss=0.009365, audio_tagging_loss=0.01083, over 16431.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08957, pruned_loss=0.01237, audio_tagging_loss=0.008624, over 3050437.96 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:50:12,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3396340.0, ans=0.125 2023-11-26 12:50:22,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3396406.6666666665, ans=0.125 2023-11-26 12:50:26,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3396406.6666666665, ans=0.1 2023-11-26 12:50:31,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3396406.6666666665, ans=0.125 2023-11-26 12:51:03,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.973e+01 9.425e+01 1.012e+02 1.226e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 12:51:03,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509500 2023-11-26 12:51:07,257 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4500, loss[loss=0.05994, simple_loss=0.08823, pruned_loss=0.009152, audio_tagging_loss=0.006672, over 15027.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08876, pruned_loss=0.01227, audio_tagging_loss=0.00867, over 3040102.93 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:51:08,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3396673.3333333335, ans=0.2 2023-11-26 12:51:15,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3396673.3333333335, ans=0.95 2023-11-26 12:51:22,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2023-11-26 12:51:39,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3396873.3333333335, ans=0.125 2023-11-26 12:51:56,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3396940.0, ans=0.125 2023-11-26 12:51:58,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-26 12:51:59,858 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509550 2023-11-26 12:51:59,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3396940.0, ans=0.95 2023-11-26 12:52:03,032 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4550, loss[loss=0.06498, simple_loss=0.09301, pruned_loss=0.01236, audio_tagging_loss=0.006122, over 14805.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08913, pruned_loss=0.01236, audio_tagging_loss=0.008701, over 3037761.98 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:52:06,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3397006.6666666665, ans=0.125 2023-11-26 12:52:08,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3397006.6666666665, ans=0.0 2023-11-26 12:52:08,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3397006.6666666665, ans=0.125 2023-11-26 12:52:08,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-26 12:52:38,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3397206.6666666665, ans=0.2 2023-11-26 12:52:42,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3397206.6666666665, ans=0.125 2023-11-26 12:52:47,788 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:52:50,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3397273.3333333335, ans=0.125 2023-11-26 12:52:52,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3397273.3333333335, ans=0.04949747468305833 2023-11-26 12:52:55,273 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509600 2023-11-26 12:52:56,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.693e+01 9.289e+01 1.006e+02 1.287e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 12:52:58,667 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4600, loss[loss=0.05861, simple_loss=0.07671, pruned_loss=0.01301, audio_tagging_loss=0.007248, over 15572.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.0899, pruned_loss=0.01252, audio_tagging_loss=0.00873, over 3037819.93 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:53:06,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3397340.0, ans=0.04949747468305833 2023-11-26 12:53:21,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3397473.3333333335, ans=0.0 2023-11-26 12:53:35,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3397540.0, ans=22.5 2023-11-26 12:53:46,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-26 12:53:51,228 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509650 2023-11-26 12:53:52,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3397606.6666666665, ans=0.1 2023-11-26 12:53:54,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2023-11-26 12:53:54,964 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4650, loss[loss=0.0784, simple_loss=0.1026, pruned_loss=0.01707, audio_tagging_loss=0.01004, over 15546.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08984, pruned_loss=0.01258, audio_tagging_loss=0.008731, over 3040427.15 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:53:55,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3397673.3333333335, ans=0.0 2023-11-26 12:54:11,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3397740.0, ans=0.125 2023-11-26 12:54:22,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3397806.6666666665, ans=0.2 2023-11-26 12:54:31,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3397873.3333333335, ans=0.0 2023-11-26 12:54:46,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3397940.0, ans=0.125 2023-11-26 12:54:48,061 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509700 2023-11-26 12:54:49,075 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.687e+01 9.484e+01 1.034e+02 1.399e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 12:54:51,719 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4700, loss[loss=0.062, simple_loss=0.0891, pruned_loss=0.009862, audio_tagging_loss=0.007589, over 15626.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08978, pruned_loss=0.01244, audio_tagging_loss=0.008827, over 3041020.98 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:55:44,100 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509750 2023-11-26 12:55:45,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3398273.3333333335, ans=0.125 2023-11-26 12:55:47,207 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4750, loss[loss=0.04905, simple_loss=0.05865, pruned_loss=0.005996, audio_tagging_loss=0.01373, over 15703.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08933, pruned_loss=0.01234, audio_tagging_loss=0.00887, over 3031622.14 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:55:47,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-26 12:56:04,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3398406.6666666665, ans=0.125 2023-11-26 12:56:14,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3398473.3333333335, ans=0.125 2023-11-26 12:56:32,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3398606.6666666665, ans=0.125 2023-11-26 12:56:39,947 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509800 2023-11-26 12:56:40,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.623e+01 9.231e+01 9.879e+01 9.064e+02, threshold=1.846e+02, percent-clipped=2.0 2023-11-26 12:56:43,263 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4800, loss[loss=0.06202, simple_loss=0.08889, pruned_loss=0.01008, audio_tagging_loss=0.00749, over 15469.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08943, pruned_loss=0.01241, audio_tagging_loss=0.00893, over 3036419.25 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:56:43,444 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:56:47,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2023-11-26 12:56:58,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3398740.0, ans=0.0 2023-11-26 12:57:22,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3398873.3333333335, ans=0.0 2023-11-26 12:57:36,385 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509850 2023-11-26 12:57:39,519 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4850, loss[loss=0.0622, simple_loss=0.09112, pruned_loss=0.008202, audio_tagging_loss=0.008439, over 15652.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08875, pruned_loss=0.01219, audio_tagging_loss=0.009003, over 3043501.11 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:57:40,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399006.6666666665, ans=0.1 2023-11-26 12:57:46,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3399006.6666666665, ans=0.125 2023-11-26 12:58:12,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3399206.6666666665, ans=0.2 2023-11-26 12:58:18,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=12.0 2023-11-26 12:58:29,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3399273.3333333335, ans=0.125 2023-11-26 12:58:30,127 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=22.5 2023-11-26 12:58:31,578 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509900 2023-11-26 12:58:33,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.785e+01 9.504e+01 1.038e+02 1.484e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 12:58:35,206 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4900, loss[loss=0.07324, simple_loss=0.1049, pruned_loss=0.01323, audio_tagging_loss=0.007554, over 14861.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08959, pruned_loss=0.01231, audio_tagging_loss=0.00899, over 3043839.66 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:58:44,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3399340.0, ans=0.025 2023-11-26 12:59:08,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3399540.0, ans=0.0 2023-11-26 12:59:09,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-26 12:59:12,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3399540.0, ans=10.0 2023-11-26 12:59:17,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3399540.0, ans=0.0 2023-11-26 12:59:27,651 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509950 2023-11-26 12:59:30,722 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4950, loss[loss=0.06792, simple_loss=0.09409, pruned_loss=0.01252, audio_tagging_loss=0.008361, over 14900.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08933, pruned_loss=0.01212, audio_tagging_loss=0.008846, over 3037890.80 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:59:45,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3399740.0, ans=0.0 2023-11-26 12:59:54,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3399806.6666666665, ans=0.125 2023-11-26 12:59:54,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3399806.6666666665, ans=0.0 2023-11-26 13:00:03,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-26 13:00:05,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-26 13:00:14,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3399940.0, ans=0.125 2023-11-26 13:00:23,371 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510000 2023-11-26 13:00:24,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.558e+01 9.135e+01 1.006e+02 1.501e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 13:00:27,001 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5000, loss[loss=0.05251, simple_loss=0.07805, pruned_loss=0.008359, audio_tagging_loss=0.005123, over 13837.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0892, pruned_loss=0.01219, audio_tagging_loss=0.008683, over 3043842.68 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:00:36,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3400073.3333333335, ans=0.09899494936611666 2023-11-26 13:00:38,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3400073.3333333335, ans=0.0 2023-11-26 13:01:02,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3400206.6666666665, ans=0.05 2023-11-26 13:01:14,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=22.5 2023-11-26 13:01:18,670 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510050 2023-11-26 13:01:19,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2023-11-26 13:01:21,752 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5050, loss[loss=0.07512, simple_loss=0.1006, pruned_loss=0.01777, audio_tagging_loss=0.007043, over 14830.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08872, pruned_loss=0.01204, audio_tagging_loss=0.008622, over 3038441.16 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:01:40,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3400406.6666666665, ans=0.0 2023-11-26 13:02:01,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400540.0, ans=0.1 2023-11-26 13:02:14,031 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510100 2023-11-26 13:02:14,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.485e+01 9.179e+01 9.722e+01 1.214e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 13:02:17,648 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5100, loss[loss=0.04923, simple_loss=0.06351, pruned_loss=0.008145, audio_tagging_loss=0.009333, over 16284.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08768, pruned_loss=0.01188, audio_tagging_loss=0.008647, over 3039578.78 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:02:19,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3400673.3333333335, ans=0.125 2023-11-26 13:02:34,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3400740.0, ans=0.0 2023-11-26 13:03:10,342 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510150 2023-11-26 13:03:10,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400940.0, ans=0.1 2023-11-26 13:03:13,912 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5150, loss[loss=0.07636, simple_loss=0.1166, pruned_loss=0.01257, audio_tagging_loss=0.005485, over 15506.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08853, pruned_loss=0.01201, audio_tagging_loss=0.008601, over 3038249.22 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:03:14,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2023-11-26 13:03:19,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3401006.6666666665, ans=0.125 2023-11-26 13:03:22,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3401006.6666666665, ans=0.0 2023-11-26 13:03:47,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3401206.6666666665, ans=0.0 2023-11-26 13:04:06,049 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510200 2023-11-26 13:04:06,966 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.866e+01 9.575e+01 1.024e+02 1.389e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 13:04:09,400 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5200, loss[loss=0.08628, simple_loss=0.1131, pruned_loss=0.02211, audio_tagging_loss=0.007633, over 15048.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08978, pruned_loss=0.0122, audio_tagging_loss=0.008604, over 3043606.08 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:04:09,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3401340.0, ans=0.1 2023-11-26 13:04:11,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.00 vs. limit=10.0 2023-11-26 13:04:27,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3401406.6666666665, ans=0.0 2023-11-26 13:04:29,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3401406.6666666665, ans=0.0 2023-11-26 13:04:47,554 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:05:01,180 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510250 2023-11-26 13:05:04,285 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5250, loss[loss=0.08585, simple_loss=0.1101, pruned_loss=0.02172, audio_tagging_loss=0.009081, over 14852.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08991, pruned_loss=0.01224, audio_tagging_loss=0.008564, over 3034698.70 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:05:10,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-26 13:05:14,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-26 13:05:37,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3401873.3333333335, ans=0.125 2023-11-26 13:05:54,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-26 13:05:58,249 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510300 2023-11-26 13:06:01,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.828e+01 9.543e+01 1.025e+02 1.295e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 13:06:01,341 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5300, loss[loss=0.06419, simple_loss=0.07811, pruned_loss=0.01418, audio_tagging_loss=0.01096, over 14862.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09026, pruned_loss=0.01241, audio_tagging_loss=0.008632, over 3035094.51 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:06:01,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3402006.6666666665, ans=0.2 2023-11-26 13:06:02,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-26 13:06:09,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3402006.6666666665, ans=0.125 2023-11-26 13:06:14,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3402073.3333333335, ans=0.125 2023-11-26 13:06:15,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3402073.3333333335, ans=0.125 2023-11-26 13:06:36,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3402206.6666666665, ans=0.0 2023-11-26 13:06:45,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.62 vs. limit=10.0 2023-11-26 13:06:54,096 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510350 2023-11-26 13:06:57,233 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5350, loss[loss=0.08098, simple_loss=0.1113, pruned_loss=0.01666, audio_tagging_loss=0.008694, over 14916.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09045, pruned_loss=0.01239, audio_tagging_loss=0.008582, over 3036895.69 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:07:25,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3402473.3333333335, ans=0.0 2023-11-26 13:07:38,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3402540.0, ans=0.1 2023-11-26 13:07:49,196 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510400 2023-11-26 13:07:52,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.116e+01 8.825e+01 9.466e+01 1.015e+02 1.457e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 13:07:52,641 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5400, loss[loss=0.07754, simple_loss=0.1051, pruned_loss=0.01439, audio_tagging_loss=0.01061, over 14797.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09058, pruned_loss=0.01238, audio_tagging_loss=0.00869, over 3031808.03 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:07:57,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3402673.3333333335, ans=10.0 2023-11-26 13:07:58,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3402673.3333333335, ans=15.0 2023-11-26 13:08:03,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3402740.0, ans=0.2 2023-11-26 13:08:04,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3402740.0, ans=0.125 2023-11-26 13:08:18,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3402806.6666666665, ans=0.125 2023-11-26 13:08:29,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-26 13:08:45,448 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510450 2023-11-26 13:08:49,164 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5450, loss[loss=0.08067, simple_loss=0.1137, pruned_loss=0.01803, audio_tagging_loss=0.005798, over 15811.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09045, pruned_loss=0.01243, audio_tagging_loss=0.008741, over 3032646.51 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:08:51,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3403006.6666666665, ans=0.125 2023-11-26 13:08:51,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-11-26 13:09:02,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-26 13:09:08,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=22.5 2023-11-26 13:09:22,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-26 13:09:24,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3403206.6666666665, ans=0.1 2023-11-26 13:09:41,540 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510500 2023-11-26 13:09:44,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.143e+01 8.724e+01 9.189e+01 1.004e+02 1.414e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-26 13:09:44,688 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5500, loss[loss=0.06614, simple_loss=0.09473, pruned_loss=0.01002, audio_tagging_loss=0.008755, over 14395.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09038, pruned_loss=0.01237, audio_tagging_loss=0.008751, over 3043022.31 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:09:51,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3403340.0, ans=0.125 2023-11-26 13:10:01,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3403406.6666666665, ans=0.125 2023-11-26 13:10:03,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3403406.6666666665, ans=0.125 2023-11-26 13:10:10,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3403473.3333333335, ans=10.0 2023-11-26 13:10:11,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2023-11-26 13:10:18,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3403540.0, ans=0.95 2023-11-26 13:10:35,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3403606.6666666665, ans=0.0 2023-11-26 13:10:37,155 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510550 2023-11-26 13:10:40,232 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5550, loss[loss=0.05667, simple_loss=0.06502, pruned_loss=0.009299, audio_tagging_loss=0.01486, over 16154.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09002, pruned_loss=0.01247, audio_tagging_loss=0.008926, over 3038559.72 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:10:42,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3403673.3333333335, ans=0.125 2023-11-26 13:10:48,373 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:10:49,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-26 13:10:55,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3403740.0, ans=0.125 2023-11-26 13:11:04,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.07 vs. limit=12.0 2023-11-26 13:11:20,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3403873.3333333335, ans=0.125 2023-11-26 13:11:22,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2023-11-26 13:11:27,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3403940.0, ans=0.125 2023-11-26 13:11:27,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3403940.0, ans=0.125 2023-11-26 13:11:28,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3403940.0, ans=0.125 2023-11-26 13:11:29,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3403940.0, ans=0.125 2023-11-26 13:11:32,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3403940.0, ans=0.125 2023-11-26 13:11:33,014 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510600 2023-11-26 13:11:36,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.941e+01 9.582e+01 1.033e+02 2.288e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-26 13:11:36,421 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5600, loss[loss=0.06154, simple_loss=0.08392, pruned_loss=0.009526, audio_tagging_loss=0.01006, over 15806.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09014, pruned_loss=0.01247, audio_tagging_loss=0.00898, over 3047508.03 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:11:41,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3404006.6666666665, ans=0.125 2023-11-26 13:11:44,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2023-11-26 13:11:45,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3404006.6666666665, ans=0.125 2023-11-26 13:11:50,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3404073.3333333335, ans=10.0 2023-11-26 13:11:50,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2023-11-26 13:11:51,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-26 13:12:03,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=22.5 2023-11-26 13:12:08,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3404140.0, ans=0.0 2023-11-26 13:12:18,016 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:12:30,140 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510650 2023-11-26 13:12:30,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3404273.3333333335, ans=0.125 2023-11-26 13:12:33,259 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5650, loss[loss=0.06278, simple_loss=0.07771, pruned_loss=0.01155, audio_tagging_loss=0.01238, over 17045.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08965, pruned_loss=0.01232, audio_tagging_loss=0.009117, over 3046732.95 frames. ], batch size: 65, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:12:44,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3404406.6666666665, ans=0.0 2023-11-26 13:12:47,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404406.6666666665, ans=0.1 2023-11-26 13:12:53,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-26 13:13:00,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3404473.3333333335, ans=0.1 2023-11-26 13:13:01,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3404473.3333333335, ans=0.0 2023-11-26 13:13:01,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3404473.3333333335, ans=0.125 2023-11-26 13:13:11,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3404540.0, ans=0.125 2023-11-26 13:13:22,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3404606.6666666665, ans=0.0 2023-11-26 13:13:25,382 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510700 2023-11-26 13:13:27,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3404673.3333333335, ans=0.5 2023-11-26 13:13:28,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.672e+01 9.212e+01 9.928e+01 1.414e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 13:13:28,534 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5700, loss[loss=0.06144, simple_loss=0.07988, pruned_loss=0.01042, audio_tagging_loss=0.01108, over 15067.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09023, pruned_loss=0.01243, audio_tagging_loss=0.009004, over 3048505.60 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:13:28,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3404673.3333333335, ans=0.1 2023-11-26 13:13:30,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3404673.3333333335, ans=0.125 2023-11-26 13:13:34,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3404673.3333333335, ans=0.2 2023-11-26 13:13:51,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=15.0 2023-11-26 13:13:52,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3404806.6666666665, ans=0.2 2023-11-26 13:14:02,456 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-26 13:14:05,258 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:14:06,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3404873.3333333335, ans=0.125 2023-11-26 13:14:13,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3404940.0, ans=0.0 2023-11-26 13:14:14,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3404940.0, ans=0.1 2023-11-26 13:14:21,388 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510750 2023-11-26 13:14:22,812 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=22.5 2023-11-26 13:14:23,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3405006.6666666665, ans=0.125 2023-11-26 13:14:23,738 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:14:24,487 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5750, loss[loss=0.06433, simple_loss=0.07149, pruned_loss=0.01455, audio_tagging_loss=0.01404, over 14720.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.0902, pruned_loss=0.01263, audio_tagging_loss=0.008953, over 3049688.16 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:14:49,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3405140.0, ans=0.125 2023-11-26 13:15:17,196 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510800 2023-11-26 13:15:20,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.469e+01 9.302e+01 1.019e+02 1.569e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 13:15:20,847 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5800, loss[loss=0.07328, simple_loss=0.09692, pruned_loss=0.01653, audio_tagging_loss=0.00829, over 14777.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08971, pruned_loss=0.01244, audio_tagging_loss=0.008822, over 3044259.44 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:15:34,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3405406.6666666665, ans=0.05 2023-11-26 13:15:39,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3405406.6666666665, ans=0.125 2023-11-26 13:15:49,589 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:16:13,309 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510850 2023-11-26 13:16:16,492 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5850, loss[loss=0.07651, simple_loss=0.1164, pruned_loss=0.0114, audio_tagging_loss=0.006928, over 16439.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08959, pruned_loss=0.01235, audio_tagging_loss=0.008766, over 3052870.88 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:16:20,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3405673.3333333335, ans=0.0 2023-11-26 13:16:21,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3405673.3333333335, ans=0.125 2023-11-26 13:16:25,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3405673.3333333335, ans=0.0 2023-11-26 13:16:26,979 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2023-11-26 13:17:03,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3405940.0, ans=0.1 2023-11-26 13:17:08,192 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510900 2023-11-26 13:17:11,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.774e+01 9.383e+01 1.009e+02 2.236e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-26 13:17:11,796 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5900, loss[loss=0.05165, simple_loss=0.06938, pruned_loss=0.007641, audio_tagging_loss=0.009317, over 15190.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08993, pruned_loss=0.0122, audio_tagging_loss=0.008711, over 3049866.87 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:17:28,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3406073.3333333335, ans=0.0 2023-11-26 13:17:38,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3406140.0, ans=0.0 2023-11-26 13:18:04,167 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510950 2023-11-26 13:18:07,251 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5950, loss[loss=0.05314, simple_loss=0.07069, pruned_loss=0.007819, audio_tagging_loss=0.009981, over 15875.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09037, pruned_loss=0.01232, audio_tagging_loss=0.008661, over 3050616.07 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:18:12,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.28 vs. limit=15.0 2023-11-26 13:18:23,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3406406.6666666665, ans=0.125 2023-11-26 13:18:45,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3406540.0, ans=0.125 2023-11-26 13:18:54,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3406606.6666666665, ans=0.07 2023-11-26 13:18:59,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511000 2023-11-26 13:19:02,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3406673.3333333335, ans=0.125 2023-11-26 13:19:03,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 8.762e+01 9.206e+01 9.891e+01 1.298e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 13:19:03,733 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6000, loss[loss=0.08039, simple_loss=0.1198, pruned_loss=0.01429, audio_tagging_loss=0.006188, over 16025.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09029, pruned_loss=0.01229, audio_tagging_loss=0.008585, over 3050921.55 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:19:03,734 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 13:19:29,473 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3495, 4.3217, 4.5155, 4.4751], device='cuda:2') 2023-11-26 13:19:36,331 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.05784, simple_loss=0.05057, pruned_loss=0.005191, audio_tagging_loss=0.02736, over 4681554.00 frames. 2023-11-26 13:19:36,332 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 13:19:53,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2023-11-26 13:20:01,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406806.6666666665, ans=0.1 2023-11-26 13:20:06,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3406806.6666666665, ans=0.125 2023-11-26 13:20:17,566 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:20:20,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3406940.0, ans=0.0 2023-11-26 13:20:25,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3406940.0, ans=0.125 2023-11-26 13:20:28,754 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511050 2023-11-26 13:20:32,336 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6050, loss[loss=0.08013, simple_loss=0.1081, pruned_loss=0.01707, audio_tagging_loss=0.009034, over 15559.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08929, pruned_loss=0.01202, audio_tagging_loss=0.008536, over 3047214.31 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:20:44,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3407073.3333333335, ans=0.125 2023-11-26 13:21:06,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=22.5 2023-11-26 13:21:12,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3407206.6666666665, ans=0.125 2023-11-26 13:21:21,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3407273.3333333335, ans=0.125 2023-11-26 13:21:23,753 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511100 2023-11-26 13:21:26,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.64 vs. limit=10.0 2023-11-26 13:21:27,480 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6100, loss[loss=0.05273, simple_loss=0.07059, pruned_loss=0.01038, audio_tagging_loss=0.007057, over 14849.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08892, pruned_loss=0.01216, audio_tagging_loss=0.008612, over 3040303.69 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:21:28,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 8.831e+01 9.526e+01 1.012e+02 1.265e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 13:21:32,311 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-26 13:22:19,335 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511150 2023-11-26 13:22:22,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3407673.3333333335, ans=0.09899494936611666 2023-11-26 13:22:23,025 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6150, loss[loss=0.06831, simple_loss=0.09374, pruned_loss=0.01149, audio_tagging_loss=0.009948, over 15451.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08979, pruned_loss=0.01246, audio_tagging_loss=0.00856, over 3041222.06 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:22:27,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3407673.3333333335, ans=0.0 2023-11-26 13:22:29,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3407673.3333333335, ans=0.95 2023-11-26 13:22:47,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3407806.6666666665, ans=0.125 2023-11-26 13:23:04,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3407873.3333333335, ans=0.0 2023-11-26 13:23:04,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3407873.3333333335, ans=0.125 2023-11-26 13:23:15,409 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511200 2023-11-26 13:23:19,292 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6200, loss[loss=0.04017, simple_loss=0.05008, pruned_loss=0.005619, audio_tagging_loss=0.00951, over 15134.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08869, pruned_loss=0.01218, audio_tagging_loss=0.008643, over 3041006.03 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:23:20,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.535e+01 9.196e+01 1.004e+02 1.259e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 13:23:33,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-26 13:23:40,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3408140.0, ans=0.0 2023-11-26 13:23:42,470 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:23:53,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3408206.6666666665, ans=0.1 2023-11-26 13:24:03,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3408273.3333333335, ans=0.0 2023-11-26 13:24:05,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3408273.3333333335, ans=0.125 2023-11-26 13:24:11,521 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511250 2023-11-26 13:24:14,669 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6250, loss[loss=0.06786, simple_loss=0.08899, pruned_loss=0.01424, audio_tagging_loss=0.009127, over 14114.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08923, pruned_loss=0.01223, audio_tagging_loss=0.00874, over 3035207.74 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:24:17,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3408340.0, ans=0.09899494936611666 2023-11-26 13:24:57,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3408540.0, ans=0.125 2023-11-26 13:25:07,719 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511300 2023-11-26 13:25:10,839 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6300, loss[loss=0.07121, simple_loss=0.1031, pruned_loss=0.01324, audio_tagging_loss=0.006425, over 15112.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08998, pruned_loss=0.0124, audio_tagging_loss=0.008738, over 3037796.43 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:25:12,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.818e+01 9.508e+01 1.037e+02 1.214e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 13:25:20,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3408673.3333333335, ans=0.0 2023-11-26 13:25:24,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2023-11-26 13:25:29,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3408740.0, ans=0.2 2023-11-26 13:25:29,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2023-11-26 13:25:33,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3408806.6666666665, ans=0.0 2023-11-26 13:25:56,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3408940.0, ans=0.035 2023-11-26 13:25:59,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.61 vs. limit=22.5 2023-11-26 13:26:04,129 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511350 2023-11-26 13:26:07,269 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6350, loss[loss=0.0474, simple_loss=0.06255, pruned_loss=0.007393, audio_tagging_loss=0.008734, over 14786.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08916, pruned_loss=0.01215, audio_tagging_loss=0.008883, over 3037955.43 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:26:11,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3409006.6666666665, ans=0.125 2023-11-26 13:26:11,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3409006.6666666665, ans=0.0 2023-11-26 13:26:40,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3409206.6666666665, ans=0.1 2023-11-26 13:26:59,936 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511400 2023-11-26 13:27:03,270 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6400, loss[loss=0.05877, simple_loss=0.08405, pruned_loss=0.009117, audio_tagging_loss=0.007623, over 15227.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08941, pruned_loss=0.01239, audio_tagging_loss=0.008949, over 3039561.15 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:27:04,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.943e+01 9.499e+01 1.031e+02 1.393e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 13:27:13,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3409406.6666666665, ans=0.2 2023-11-26 13:27:22,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3409406.6666666665, ans=0.2 2023-11-26 13:27:36,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3409540.0, ans=0.125 2023-11-26 13:27:39,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3409540.0, ans=0.0 2023-11-26 13:27:40,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3409540.0, ans=0.125 2023-11-26 13:27:48,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3409606.6666666665, ans=0.1 2023-11-26 13:27:55,509 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511450 2023-11-26 13:27:58,636 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6450, loss[loss=0.05823, simple_loss=0.08194, pruned_loss=0.008573, audio_tagging_loss=0.00869, over 16598.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08897, pruned_loss=0.01225, audio_tagging_loss=0.009053, over 3039311.15 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:28:13,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3409740.0, ans=0.125 2023-11-26 13:28:42,546 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2023-11-26 13:28:51,656 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511500 2023-11-26 13:28:54,730 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6500, loss[loss=0.0762, simple_loss=0.1001, pruned_loss=0.01855, audio_tagging_loss=0.007603, over 14787.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08852, pruned_loss=0.01216, audio_tagging_loss=0.009031, over 3035963.58 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:28:55,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.812e+01 9.257e+01 9.962e+01 1.246e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 13:29:20,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.76 vs. limit=22.5 2023-11-26 13:29:37,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3410206.6666666665, ans=10.0 2023-11-26 13:29:44,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3410273.3333333335, ans=0.0 2023-11-26 13:29:47,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511550 2023-11-26 13:29:47,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3410273.3333333335, ans=0.125 2023-11-26 13:29:48,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3410273.3333333335, ans=0.125 2023-11-26 13:29:49,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3410340.0, ans=0.04949747468305833 2023-11-26 13:29:50,641 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6550, loss[loss=0.07846, simple_loss=0.1092, pruned_loss=0.01788, audio_tagging_loss=0.005985, over 14169.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08881, pruned_loss=0.01204, audio_tagging_loss=0.008828, over 3036432.44 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:30:10,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3410406.6666666665, ans=0.125 2023-11-26 13:30:14,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3410473.3333333335, ans=0.125 2023-11-26 13:30:20,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3410473.3333333335, ans=0.125 2023-11-26 13:30:21,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3410473.3333333335, ans=0.125 2023-11-26 13:30:27,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3410540.0, ans=0.125 2023-11-26 13:30:27,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3410540.0, ans=0.2 2023-11-26 13:30:28,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3410540.0, ans=0.125 2023-11-26 13:30:28,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3410540.0, ans=0.125 2023-11-26 13:30:42,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3410606.6666666665, ans=0.0 2023-11-26 13:30:42,996 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511600 2023-11-26 13:30:46,405 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6600, loss[loss=0.06989, simple_loss=0.08604, pruned_loss=0.0168, audio_tagging_loss=0.01008, over 14492.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08895, pruned_loss=0.01218, audio_tagging_loss=0.008843, over 3034561.53 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:30:47,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.677e+01 9.398e+01 1.026e+02 1.405e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 13:30:47,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3410673.3333333335, ans=0.035 2023-11-26 13:30:50,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-26 13:30:52,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3410673.3333333335, ans=0.125 2023-11-26 13:30:57,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3410740.0, ans=0.125 2023-11-26 13:30:59,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2023-11-26 13:31:04,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3410740.0, ans=0.125 2023-11-26 13:31:12,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-26 13:31:18,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3410806.6666666665, ans=0.125 2023-11-26 13:31:32,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3410940.0, ans=0.0 2023-11-26 13:31:38,999 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511650 2023-11-26 13:31:42,720 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6650, loss[loss=0.05705, simple_loss=0.07624, pruned_loss=0.01194, audio_tagging_loss=0.006992, over 14663.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0892, pruned_loss=0.01221, audio_tagging_loss=0.008709, over 3038375.61 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:31:57,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3411073.3333333335, ans=0.1 2023-11-26 13:32:30,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3411273.3333333335, ans=0.125 2023-11-26 13:32:35,549 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511700 2023-11-26 13:32:38,690 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6700, loss[loss=0.06461, simple_loss=0.09442, pruned_loss=0.01128, audio_tagging_loss=0.006117, over 14336.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08968, pruned_loss=0.01225, audio_tagging_loss=0.008574, over 3043742.37 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:32:40,755 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.754e+01 9.381e+01 1.004e+02 1.497e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 13:32:41,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3411340.0, ans=0.125 2023-11-26 13:32:45,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3411340.0, ans=0.0 2023-11-26 13:32:56,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3411406.6666666665, ans=0.1 2023-11-26 13:33:03,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411473.3333333335, ans=0.0 2023-11-26 13:33:08,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3411473.3333333335, ans=0.2 2023-11-26 13:33:17,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=22.5 2023-11-26 13:33:24,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3411606.6666666665, ans=0.125 2023-11-26 13:33:31,088 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511750 2023-11-26 13:33:32,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3411606.6666666665, ans=0.0 2023-11-26 13:33:33,645 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-26 13:33:34,180 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6750, loss[loss=0.06552, simple_loss=0.09426, pruned_loss=0.009105, audio_tagging_loss=0.009284, over 15604.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09041, pruned_loss=0.01231, audio_tagging_loss=0.008552, over 3044168.34 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:33:46,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3411740.0, ans=0.1 2023-11-26 13:33:58,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3411806.6666666665, ans=0.2 2023-11-26 13:34:05,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3411806.6666666665, ans=0.1 2023-11-26 13:34:13,546 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:34:15,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3411873.3333333335, ans=0.125 2023-11-26 13:34:22,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3411940.0, ans=0.0 2023-11-26 13:34:23,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3411940.0, ans=0.125 2023-11-26 13:34:24,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3411940.0, ans=0.0 2023-11-26 13:34:26,690 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511800 2023-11-26 13:34:27,119 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2023-11-26 13:34:30,026 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6800, loss[loss=0.05663, simple_loss=0.06634, pruned_loss=0.01074, audio_tagging_loss=0.01272, over 13815.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09125, pruned_loss=0.01256, audio_tagging_loss=0.00849, over 3047577.56 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:34:31,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3412006.6666666665, ans=0.0 2023-11-26 13:34:32,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.841e+01 9.365e+01 1.006e+02 1.409e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 13:34:33,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2023-11-26 13:34:34,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3412006.6666666665, ans=0.1 2023-11-26 13:34:48,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3412073.3333333335, ans=0.0 2023-11-26 13:35:20,510 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.62 vs. limit=10.0 2023-11-26 13:35:24,230 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511850 2023-11-26 13:35:27,356 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6850, loss[loss=0.07437, simple_loss=0.1065, pruned_loss=0.01453, audio_tagging_loss=0.006609, over 14978.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09051, pruned_loss=0.01238, audio_tagging_loss=0.008527, over 3043474.75 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:35:31,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3412340.0, ans=0.04949747468305833 2023-11-26 13:35:31,923 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:35:44,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3412406.6666666665, ans=0.125 2023-11-26 13:36:19,385 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511900 2023-11-26 13:36:22,523 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6900, loss[loss=0.06298, simple_loss=0.08201, pruned_loss=0.01126, audio_tagging_loss=0.01072, over 14415.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09013, pruned_loss=0.01214, audio_tagging_loss=0.008527, over 3037166.75 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:36:23,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3412673.3333333335, ans=0.125 2023-11-26 13:36:24,648 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.611e+01 9.198e+01 9.954e+01 1.491e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 13:36:40,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-26 13:37:04,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2023-11-26 13:37:08,637 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:37:15,715 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511950 2023-11-26 13:37:18,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3413006.6666666665, ans=0.025 2023-11-26 13:37:18,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3413006.6666666665, ans=0.025 2023-11-26 13:37:18,931 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6950, loss[loss=0.04343, simple_loss=0.06211, pruned_loss=0.003455, audio_tagging_loss=0.008921, over 14699.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09034, pruned_loss=0.01202, audio_tagging_loss=0.008556, over 3041197.05 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:37:28,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3413006.6666666665, ans=0.2 2023-11-26 13:37:32,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3413073.3333333335, ans=0.0 2023-11-26 13:37:33,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3413073.3333333335, ans=0.125 2023-11-26 13:37:36,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3413073.3333333335, ans=0.1 2023-11-26 13:37:40,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3413073.3333333335, ans=0.025 2023-11-26 13:38:11,791 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512000 2023-11-26 13:38:17,718 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7000, loss[loss=0.05865, simple_loss=0.08033, pruned_loss=0.01068, audio_tagging_loss=0.00781, over 16392.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08983, pruned_loss=0.01206, audio_tagging_loss=0.008624, over 3042596.40 frames. ], batch size: 63, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:38:20,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 8.732e+01 9.495e+01 1.005e+02 2.082e+02, threshold=1.899e+02, percent-clipped=1.0 2023-11-26 13:38:57,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3413540.0, ans=0.2 2023-11-26 13:38:59,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3413540.0, ans=0.0 2023-11-26 13:39:02,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3413606.6666666665, ans=0.0 2023-11-26 13:39:10,702 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512050 2023-11-26 13:39:13,846 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7050, loss[loss=0.08239, simple_loss=0.1051, pruned_loss=0.01925, audio_tagging_loss=0.01058, over 16012.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08963, pruned_loss=0.01201, audio_tagging_loss=0.008691, over 3053697.53 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:39:31,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3413740.0, ans=0.125 2023-11-26 13:39:42,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3413806.6666666665, ans=0.0 2023-11-26 13:39:42,624 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2023-11-26 13:39:52,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-26 13:39:55,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3413873.3333333335, ans=0.0 2023-11-26 13:40:06,121 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512100 2023-11-26 13:40:06,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3413940.0, ans=0.2 2023-11-26 13:40:09,754 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7100, loss[loss=0.06285, simple_loss=0.08314, pruned_loss=0.01126, audio_tagging_loss=0.01001, over 15578.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08925, pruned_loss=0.01214, audio_tagging_loss=0.008839, over 3045791.10 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:40:12,858 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.711e+01 9.572e+01 1.021e+02 1.655e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-26 13:40:18,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3414006.6666666665, ans=0.0 2023-11-26 13:40:27,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2023-11-26 13:40:40,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3414140.0, ans=0.05 2023-11-26 13:41:02,410 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512150 2023-11-26 13:41:05,579 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7150, loss[loss=0.04692, simple_loss=0.06774, pruned_loss=0.005789, audio_tagging_loss=0.007259, over 15232.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08911, pruned_loss=0.01208, audio_tagging_loss=0.008836, over 3046769.79 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:41:44,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3414540.0, ans=0.0 2023-11-26 13:41:54,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3414606.6666666665, ans=0.2 2023-11-26 13:41:56,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3414606.6666666665, ans=0.125 2023-11-26 13:41:58,315 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512200 2023-11-26 13:42:01,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2023-11-26 13:42:02,391 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7200, loss[loss=0.05519, simple_loss=0.07645, pruned_loss=0.006777, audio_tagging_loss=0.01019, over 16192.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08851, pruned_loss=0.01196, audio_tagging_loss=0.009017, over 3047864.98 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:42:03,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3414673.3333333335, ans=0.125 2023-11-26 13:42:03,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3414673.3333333335, ans=0.125 2023-11-26 13:42:05,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.947e+01 9.542e+01 1.037e+02 1.437e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 13:42:20,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2023-11-26 13:42:35,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3414873.3333333335, ans=0.125 2023-11-26 13:42:42,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3414873.3333333335, ans=0.05 2023-11-26 13:42:46,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3414940.0, ans=0.125 2023-11-26 13:42:54,841 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512250 2023-11-26 13:42:55,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=22.5 2023-11-26 13:42:57,947 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7250, loss[loss=0.04567, simple_loss=0.05572, pruned_loss=0.008547, audio_tagging_loss=0.009263, over 14887.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08837, pruned_loss=0.01189, audio_tagging_loss=0.009051, over 3046986.78 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:43:11,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3415073.3333333335, ans=0.125 2023-11-26 13:43:12,101 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:43:22,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3415140.0, ans=0.125 2023-11-26 13:43:27,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3415140.0, ans=0.0 2023-11-26 13:43:31,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3415206.6666666665, ans=0.2 2023-11-26 13:43:40,730 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-11-26 13:43:41,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3415273.3333333335, ans=0.125 2023-11-26 13:43:46,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3415273.3333333335, ans=0.5 2023-11-26 13:43:51,085 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512300 2023-11-26 13:43:54,238 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7300, loss[loss=0.05184, simple_loss=0.06791, pruned_loss=0.007866, audio_tagging_loss=0.01002, over 15894.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08806, pruned_loss=0.01191, audio_tagging_loss=0.009012, over 3048582.48 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:43:59,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.665e+01 8.745e+01 9.385e+01 1.003e+02 1.402e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 13:44:01,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3415340.0, ans=0.125 2023-11-26 13:44:30,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3415540.0, ans=0.125 2023-11-26 13:44:38,311 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:44:44,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3415606.6666666665, ans=0.1 2023-11-26 13:44:47,372 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512350 2023-11-26 13:44:50,499 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7350, loss[loss=0.06048, simple_loss=0.08903, pruned_loss=0.009315, audio_tagging_loss=0.006654, over 15163.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08846, pruned_loss=0.01202, audio_tagging_loss=0.008801, over 3051689.28 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:44:55,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3415673.3333333335, ans=0.1 2023-11-26 13:44:57,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3415673.3333333335, ans=0.125 2023-11-26 13:45:00,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3415740.0, ans=0.125 2023-11-26 13:45:24,316 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.83 vs. limit=6.0 2023-11-26 13:45:27,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3415873.3333333335, ans=0.125 2023-11-26 13:45:43,366 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512400 2023-11-26 13:45:46,755 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7400, loss[loss=0.05822, simple_loss=0.07124, pruned_loss=0.01252, audio_tagging_loss=0.01008, over 15045.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08871, pruned_loss=0.01206, audio_tagging_loss=0.008718, over 3047584.99 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:45:48,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3416006.6666666665, ans=0.125 2023-11-26 13:45:50,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.921e+01 9.521e+01 1.008e+02 1.264e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 13:45:58,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3416073.3333333335, ans=0.125 2023-11-26 13:46:02,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3416073.3333333335, ans=0.1 2023-11-26 13:46:05,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=12.0 2023-11-26 13:46:10,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2023-11-26 13:46:33,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3416273.3333333335, ans=0.125 2023-11-26 13:46:40,229 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512450 2023-11-26 13:46:43,236 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7450, loss[loss=0.05643, simple_loss=0.08054, pruned_loss=0.006968, audio_tagging_loss=0.009194, over 16275.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08891, pruned_loss=0.01202, audio_tagging_loss=0.008573, over 3040797.29 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:46:58,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3416406.6666666665, ans=0.125 2023-11-26 13:47:16,519 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:47:19,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3416540.0, ans=0.125 2023-11-26 13:47:35,961 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512500 2023-11-26 13:47:39,111 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7500, loss[loss=0.05543, simple_loss=0.07796, pruned_loss=0.009357, audio_tagging_loss=0.007088, over 14645.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08935, pruned_loss=0.01199, audio_tagging_loss=0.008596, over 3045891.53 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:47:39,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-26 13:47:43,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.713e+01 9.201e+01 9.904e+01 1.159e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 13:47:48,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3416740.0, ans=0.125 2023-11-26 13:47:50,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3416740.0, ans=0.1 2023-11-26 13:48:31,290 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512550 2023-11-26 13:48:34,365 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7550, loss[loss=0.0442, simple_loss=0.0635, pruned_loss=0.003137, audio_tagging_loss=0.009311, over 14550.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08889, pruned_loss=0.01196, audio_tagging_loss=0.00855, over 3042209.91 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:48:45,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3417073.3333333335, ans=15.0 2023-11-26 13:49:05,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3417140.0, ans=0.1 2023-11-26 13:49:14,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3417206.6666666665, ans=0.07 2023-11-26 13:49:27,949 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512600 2023-11-26 13:49:30,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3417340.0, ans=0.2 2023-11-26 13:49:30,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3417340.0, ans=0.07 2023-11-26 13:49:31,662 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7600, loss[loss=0.05961, simple_loss=0.08213, pruned_loss=0.008962, audio_tagging_loss=0.009588, over 16005.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08908, pruned_loss=0.01205, audio_tagging_loss=0.008561, over 3043763.28 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:49:35,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2023-11-26 13:49:35,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.691e+01 9.310e+01 9.815e+01 1.310e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 13:50:24,319 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512650 2023-11-26 13:50:27,355 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7650, loss[loss=0.07651, simple_loss=0.09071, pruned_loss=0.02122, audio_tagging_loss=0.009936, over 14313.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.0887, pruned_loss=0.01215, audio_tagging_loss=0.008601, over 3040653.46 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:50:30,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3417673.3333333335, ans=0.125 2023-11-26 13:50:33,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3417673.3333333335, ans=0.2 2023-11-26 13:50:50,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-26 13:50:57,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3417806.6666666665, ans=0.125 2023-11-26 13:51:03,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3417873.3333333335, ans=0.1 2023-11-26 13:51:15,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3417940.0, ans=0.1 2023-11-26 13:51:17,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3417940.0, ans=0.125 2023-11-26 13:51:19,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=12.0 2023-11-26 13:51:19,496 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512700 2023-11-26 13:51:22,618 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7700, loss[loss=0.0778, simple_loss=0.1049, pruned_loss=0.01479, audio_tagging_loss=0.01054, over 14501.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08848, pruned_loss=0.01211, audio_tagging_loss=0.00868, over 3038245.32 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:51:26,877 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.781e+01 9.620e+01 1.045e+02 1.417e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 13:51:36,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3418073.3333333335, ans=0.1 2023-11-26 13:51:39,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3418073.3333333335, ans=0.125 2023-11-26 13:51:59,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3418206.6666666665, ans=0.0 2023-11-26 13:52:02,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=10.0 2023-11-26 13:52:14,131 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:52:15,484 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512750 2023-11-26 13:52:19,055 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7750, loss[loss=0.06028, simple_loss=0.08404, pruned_loss=0.01046, audio_tagging_loss=0.007794, over 15127.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08901, pruned_loss=0.01221, audio_tagging_loss=0.00871, over 3043898.19 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:53:05,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3418606.6666666665, ans=0.125 2023-11-26 13:53:11,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3418606.6666666665, ans=0.05 2023-11-26 13:53:12,074 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512800 2023-11-26 13:53:12,720 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-26 13:53:15,454 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7800, loss[loss=0.05853, simple_loss=0.07578, pruned_loss=0.01112, audio_tagging_loss=0.009521, over 15276.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09001, pruned_loss=0.01236, audio_tagging_loss=0.008687, over 3048207.96 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:53:19,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 9.103e+01 9.758e+01 1.031e+02 1.342e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-26 13:53:30,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3418740.0, ans=0.0 2023-11-26 13:53:40,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.14 vs. limit=10.0 2023-11-26 13:53:53,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-11-26 13:54:02,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3418940.0, ans=0.1 2023-11-26 13:54:07,460 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512850 2023-11-26 13:54:10,517 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7850, loss[loss=0.08358, simple_loss=0.1087, pruned_loss=0.01839, audio_tagging_loss=0.01085, over 15143.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09022, pruned_loss=0.01242, audio_tagging_loss=0.008744, over 3053563.07 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:54:19,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3419006.6666666665, ans=0.0 2023-11-26 13:54:27,792 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:54:38,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3419140.0, ans=0.1 2023-11-26 13:54:39,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3419140.0, ans=0.125 2023-11-26 13:55:00,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3419273.3333333335, ans=0.0 2023-11-26 13:55:02,783 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512900 2023-11-26 13:55:06,486 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7900, loss[loss=0.06813, simple_loss=0.09407, pruned_loss=0.01352, audio_tagging_loss=0.007574, over 16299.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09062, pruned_loss=0.01256, audio_tagging_loss=0.008815, over 3054088.14 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:55:11,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3419340.0, ans=0.125 2023-11-26 13:55:12,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 9.008e+01 9.612e+01 1.015e+02 1.376e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 13:55:14,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3419340.0, ans=0.125 2023-11-26 13:55:33,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3419473.3333333335, ans=0.0 2023-11-26 13:55:45,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2023-11-26 13:55:49,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3419606.6666666665, ans=0.05 2023-11-26 13:55:52,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3419606.6666666665, ans=0.0 2023-11-26 13:56:00,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512950 2023-11-26 13:56:03,113 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7950, loss[loss=0.07719, simple_loss=0.106, pruned_loss=0.0136, audio_tagging_loss=0.0106, over 16080.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08933, pruned_loss=0.01229, audio_tagging_loss=0.008974, over 3053795.35 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:56:17,930 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:56:35,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3419873.3333333335, ans=0.1 2023-11-26 13:56:54,982 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513000 2023-11-26 13:56:58,391 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8000, loss[loss=0.06055, simple_loss=0.07563, pruned_loss=0.0134, audio_tagging_loss=0.009328, over 15171.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08902, pruned_loss=0.01237, audio_tagging_loss=0.00907, over 3044538.58 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:56:58,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3420006.6666666665, ans=0.125 2023-11-26 13:57:03,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.642e+01 9.203e+01 9.908e+01 1.245e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 13:57:47,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3420273.3333333335, ans=0.1 2023-11-26 13:57:50,830 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513050 2023-11-26 13:57:54,565 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8050, loss[loss=0.055, simple_loss=0.06852, pruned_loss=0.01087, audio_tagging_loss=0.00988, over 15288.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08878, pruned_loss=0.01236, audio_tagging_loss=0.009102, over 3036148.23 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:58:19,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2023-11-26 13:58:21,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3420473.3333333335, ans=0.0 2023-11-26 13:58:40,152 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-26 13:58:46,630 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513100 2023-11-26 13:58:50,315 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8100, loss[loss=0.03829, simple_loss=0.05212, pruned_loss=0.003888, audio_tagging_loss=0.008344, over 14025.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08864, pruned_loss=0.01229, audio_tagging_loss=0.009106, over 3038135.83 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:58:56,152 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.587e+01 9.236e+01 9.771e+01 1.279e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 13:58:58,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3420673.3333333335, ans=0.0 2023-11-26 13:59:06,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=22.5 2023-11-26 13:59:06,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2023-11-26 13:59:08,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3420740.0, ans=0.09899494936611666 2023-11-26 13:59:32,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-26 13:59:43,038 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513150 2023-11-26 13:59:46,086 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8150, loss[loss=0.06062, simple_loss=0.07535, pruned_loss=0.01156, audio_tagging_loss=0.01138, over 15112.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08865, pruned_loss=0.01219, audio_tagging_loss=0.008857, over 3035767.89 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:00:04,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2023-11-26 14:00:06,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3421140.0, ans=0.2 2023-11-26 14:00:20,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2023-11-26 14:00:31,881 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2023-11-26 14:00:37,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513200 2023-11-26 14:00:41,332 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8200, loss[loss=0.07011, simple_loss=0.09277, pruned_loss=0.01253, audio_tagging_loss=0.0112, over 14400.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08895, pruned_loss=0.01212, audio_tagging_loss=0.00882, over 3040556.70 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:00:42,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3421340.0, ans=0.125 2023-11-26 14:00:43,501 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:00:43,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3421340.0, ans=0.0 2023-11-26 14:00:44,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-26 14:00:44,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2023-11-26 14:00:47,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.881e+01 9.462e+01 1.023e+02 1.490e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 14:00:51,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3421340.0, ans=0.0 2023-11-26 14:01:01,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3421406.6666666665, ans=0.0 2023-11-26 14:01:34,705 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513250 2023-11-26 14:01:34,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3421606.6666666665, ans=0.1 2023-11-26 14:01:37,849 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8250, loss[loss=0.06023, simple_loss=0.08987, pruned_loss=0.0105, audio_tagging_loss=0.004795, over 14222.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08867, pruned_loss=0.01188, audio_tagging_loss=0.008695, over 3039757.91 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:02:18,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3421873.3333333335, ans=0.0 2023-11-26 14:02:22,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3421940.0, ans=0.125 2023-11-26 14:02:23,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3421940.0, ans=0.09899494936611666 2023-11-26 14:02:30,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513300 2023-11-26 14:02:34,292 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8300, loss[loss=0.04179, simple_loss=0.06353, pruned_loss=0.002583, audio_tagging_loss=0.007447, over 13380.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08876, pruned_loss=0.01201, audio_tagging_loss=0.00865, over 3038161.88 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:02:40,661 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.839e+01 9.487e+01 1.004e+02 1.588e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 14:02:41,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3422006.6666666665, ans=0.125 2023-11-26 14:02:43,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3422073.3333333335, ans=0.0 2023-11-26 14:03:11,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3422206.6666666665, ans=0.0 2023-11-26 14:03:14,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422206.6666666665, ans=0.1 2023-11-26 14:03:26,297 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513350 2023-11-26 14:03:29,407 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8350, loss[loss=0.04408, simple_loss=0.05541, pruned_loss=0.008452, audio_tagging_loss=0.00792, over 13848.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.0874, pruned_loss=0.01181, audio_tagging_loss=0.008734, over 3037035.47 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 14:03:49,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3422406.6666666665, ans=0.1 2023-11-26 14:03:53,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3422473.3333333335, ans=0.2 2023-11-26 14:04:22,058 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513400 2023-11-26 14:04:25,969 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8400, loss[loss=0.04668, simple_loss=0.05534, pruned_loss=0.009494, audio_tagging_loss=0.009516, over 16692.00 frames. ], tot_loss[loss=0.06388, simple_loss=0.0868, pruned_loss=0.01178, audio_tagging_loss=0.008696, over 3044611.24 frames. ], batch size: 65, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:04:30,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3422673.3333333335, ans=0.0 2023-11-26 14:04:33,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.557e+01 9.224e+01 9.865e+01 1.202e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 14:04:40,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3422740.0, ans=0.025 2023-11-26 14:04:49,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3422806.6666666665, ans=0.125 2023-11-26 14:05:14,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3422940.0, ans=0.125 2023-11-26 14:05:18,114 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513450 2023-11-26 14:05:21,190 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8450, loss[loss=0.05884, simple_loss=0.07924, pruned_loss=0.008718, audio_tagging_loss=0.0105, over 15007.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08761, pruned_loss=0.01185, audio_tagging_loss=0.008658, over 3044885.91 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:05:30,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3423006.6666666665, ans=0.125 2023-11-26 14:05:37,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3423073.3333333335, ans=0.0 2023-11-26 14:05:38,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3423073.3333333335, ans=6.0 2023-11-26 14:05:39,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3423073.3333333335, ans=0.125 2023-11-26 14:05:50,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3423140.0, ans=0.0 2023-11-26 14:05:51,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3423140.0, ans=0.2 2023-11-26 14:05:54,659 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=22.5 2023-11-26 14:06:04,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3423273.3333333335, ans=0.125 2023-11-26 14:06:13,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513500 2023-11-26 14:06:14,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3423273.3333333335, ans=0.0 2023-11-26 14:06:16,996 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8500, loss[loss=0.06571, simple_loss=0.09209, pruned_loss=0.01054, audio_tagging_loss=0.009123, over 14647.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08867, pruned_loss=0.01198, audio_tagging_loss=0.008629, over 3050085.26 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:06:19,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2023-11-26 14:06:24,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.916e+01 9.646e+01 1.037e+02 1.510e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 14:06:31,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-11-26 14:06:38,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3423473.3333333335, ans=0.2 2023-11-26 14:07:09,550 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513550 2023-11-26 14:07:12,663 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8550, loss[loss=0.07631, simple_loss=0.1106, pruned_loss=0.01461, audio_tagging_loss=0.006401, over 14914.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08926, pruned_loss=0.01221, audio_tagging_loss=0.008646, over 3052103.64 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:07:13,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2023-11-26 14:07:14,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3423673.3333333335, ans=0.05 2023-11-26 14:07:23,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3423740.0, ans=0.0 2023-11-26 14:07:23,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3423740.0, ans=0.2 2023-11-26 14:07:36,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=22.5 2023-11-26 14:07:37,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3423806.6666666665, ans=0.1 2023-11-26 14:07:40,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3423806.6666666665, ans=0.2 2023-11-26 14:07:59,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-26 14:08:05,900 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513600 2023-11-26 14:08:09,295 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8600, loss[loss=0.0626, simple_loss=0.08677, pruned_loss=0.009276, audio_tagging_loss=0.009942, over 14635.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.0891, pruned_loss=0.01215, audio_tagging_loss=0.008782, over 3046945.17 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:08:12,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3424006.6666666665, ans=0.0 2023-11-26 14:08:16,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.755e+01 9.267e+01 1.010e+02 1.487e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 14:08:33,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3424140.0, ans=0.125 2023-11-26 14:08:39,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3424140.0, ans=0.1 2023-11-26 14:08:57,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=22.5 2023-11-26 14:08:58,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3424273.3333333335, ans=0.125 2023-11-26 14:09:01,404 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513650 2023-11-26 14:09:05,084 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8650, loss[loss=0.06367, simple_loss=0.08437, pruned_loss=0.01176, audio_tagging_loss=0.009726, over 14395.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08925, pruned_loss=0.01207, audio_tagging_loss=0.008848, over 3053205.84 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:09:14,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3424406.6666666665, ans=0.1 2023-11-26 14:09:24,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3424406.6666666665, ans=15.0 2023-11-26 14:09:34,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3424473.3333333335, ans=0.05 2023-11-26 14:09:34,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3424473.3333333335, ans=10.0 2023-11-26 14:09:38,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-26 14:09:39,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3424540.0, ans=0.07 2023-11-26 14:09:42,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3424540.0, ans=0.0 2023-11-26 14:09:56,940 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513700 2023-11-26 14:10:00,532 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8700, loss[loss=0.05905, simple_loss=0.07674, pruned_loss=0.008919, audio_tagging_loss=0.01177, over 14426.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08977, pruned_loss=0.01224, audio_tagging_loss=0.008911, over 3056317.62 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:10:02,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.29 vs. limit=10.0 2023-11-26 14:10:06,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3424673.3333333335, ans=0.125 2023-11-26 14:10:08,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.016e+01 8.732e+01 9.410e+01 1.013e+02 1.633e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 14:10:09,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-11-26 14:10:09,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.86 vs. limit=10.0 2023-11-26 14:10:14,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3424740.0, ans=0.5 2023-11-26 14:10:33,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2023-11-26 14:10:45,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3424940.0, ans=0.125 2023-11-26 14:10:53,874 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513750 2023-11-26 14:10:57,010 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8750, loss[loss=0.05893, simple_loss=0.08065, pruned_loss=0.008404, audio_tagging_loss=0.0102, over 14882.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09023, pruned_loss=0.01232, audio_tagging_loss=0.008944, over 3056586.10 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:11:06,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3425073.3333333335, ans=0.0 2023-11-26 14:11:09,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3425073.3333333335, ans=0.0 2023-11-26 14:11:16,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3425073.3333333335, ans=0.125 2023-11-26 14:11:49,036 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513800 2023-11-26 14:11:52,394 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8800, loss[loss=0.04696, simple_loss=0.05982, pruned_loss=0.007689, audio_tagging_loss=0.009362, over 15696.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09132, pruned_loss=0.01261, audio_tagging_loss=0.008969, over 3056459.63 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:11:55,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3425340.0, ans=0.125 2023-11-26 14:12:00,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.965e+01 9.351e+01 9.840e+01 1.391e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 14:12:03,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3425406.6666666665, ans=0.125 2023-11-26 14:12:09,954 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2023-11-26 14:12:20,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3425473.3333333335, ans=0.0 2023-11-26 14:12:26,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3425540.0, ans=0.125 2023-11-26 14:12:28,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3425540.0, ans=0.07 2023-11-26 14:12:30,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3425540.0, ans=0.125 2023-11-26 14:12:43,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-26 14:12:44,794 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513850 2023-11-26 14:12:48,488 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8850, loss[loss=0.07714, simple_loss=0.1059, pruned_loss=0.0163, audio_tagging_loss=0.007896, over 14829.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09072, pruned_loss=0.01249, audio_tagging_loss=0.008967, over 3044557.60 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:12:54,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-26 14:12:55,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3425673.3333333335, ans=0.0 2023-11-26 14:13:01,195 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:13:01,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3425740.0, ans=0.2 2023-11-26 14:13:15,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-11-26 14:13:22,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3425873.3333333335, ans=0.1 2023-11-26 14:13:40,746 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513900 2023-11-26 14:13:44,392 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8900, loss[loss=0.05176, simple_loss=0.07084, pruned_loss=0.006066, audio_tagging_loss=0.01028, over 14982.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09041, pruned_loss=0.01229, audio_tagging_loss=0.008846, over 3044874.76 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:13:46,065 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.71 vs. limit=10.0 2023-11-26 14:13:52,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.670e+01 9.413e+01 1.057e+02 1.382e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 14:13:54,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3426073.3333333335, ans=0.0 2023-11-26 14:14:36,301 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513950 2023-11-26 14:14:39,371 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8950, loss[loss=0.06637, simple_loss=0.08856, pruned_loss=0.01296, audio_tagging_loss=0.009118, over 14744.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09016, pruned_loss=0.01221, audio_tagging_loss=0.008718, over 3046887.39 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:14:45,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3426340.0, ans=0.025 2023-11-26 14:14:53,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3426406.6666666665, ans=0.2 2023-11-26 14:15:01,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2023-11-26 14:15:05,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-26 14:15:17,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3426540.0, ans=0.0 2023-11-26 14:15:19,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3426540.0, ans=0.2 2023-11-26 14:15:23,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3426606.6666666665, ans=0.125 2023-11-26 14:15:25,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3426606.6666666665, ans=0.2 2023-11-26 14:15:26,812 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2023-11-26 14:15:30,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514000 2023-11-26 14:15:34,072 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9000, loss[loss=0.07816, simple_loss=0.1205, pruned_loss=0.01151, audio_tagging_loss=0.006427, over 16107.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09077, pruned_loss=0.01233, audio_tagging_loss=0.008578, over 3058967.25 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:15:34,073 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 14:15:59,486 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2602, 2.9699, 3.2558, 2.9767, 3.6535, 3.7192, 3.2398, 3.2143], device='cuda:2') 2023-11-26 14:16:06,634 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.05882, simple_loss=0.0506, pruned_loss=0.005335, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-26 14:16:06,635 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 14:16:15,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.126e+01 8.986e+01 9.503e+01 1.043e+02 1.217e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 14:16:16,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3426740.0, ans=0.2 2023-11-26 14:16:24,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3426740.0, ans=0.0 2023-11-26 14:16:29,407 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2023-11-26 14:16:40,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3426873.3333333335, ans=0.125 2023-11-26 14:16:58,888 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514050 2023-11-26 14:17:01,969 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9050, loss[loss=0.06103, simple_loss=0.07093, pruned_loss=0.01125, audio_tagging_loss=0.01432, over 15477.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09127, pruned_loss=0.01251, audio_tagging_loss=0.008557, over 3053958.32 frames. ], batch size: 63, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:17:06,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3427006.6666666665, ans=0.1 2023-11-26 14:17:16,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3427073.3333333335, ans=0.125 2023-11-26 14:17:17,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3427073.3333333335, ans=0.0 2023-11-26 14:17:47,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2023-11-26 14:17:51,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3427273.3333333335, ans=0.1 2023-11-26 14:17:54,115 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514100 2023-11-26 14:17:57,010 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:17:57,790 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9100, loss[loss=0.07496, simple_loss=0.1031, pruned_loss=0.01386, audio_tagging_loss=0.009576, over 15046.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09088, pruned_loss=0.01245, audio_tagging_loss=0.008491, over 3061003.31 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:18:05,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3427340.0, ans=0.0 2023-11-26 14:18:07,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.883e+01 9.542e+01 1.028e+02 1.451e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 14:18:11,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3427406.6666666665, ans=0.0 2023-11-26 14:18:16,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=12.0 2023-11-26 14:18:19,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3427406.6666666665, ans=0.2 2023-11-26 14:18:23,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3427473.3333333335, ans=0.2 2023-11-26 14:18:26,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3427473.3333333335, ans=0.125 2023-11-26 14:18:26,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3427473.3333333335, ans=0.09899494936611666 2023-11-26 14:18:28,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3427473.3333333335, ans=0.125 2023-11-26 14:18:30,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3427540.0, ans=0.125 2023-11-26 14:18:43,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3427606.6666666665, ans=0.125 2023-11-26 14:18:43,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3427606.6666666665, ans=0.2 2023-11-26 14:18:51,480 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514150 2023-11-26 14:18:53,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.25 vs. limit=6.0 2023-11-26 14:18:55,212 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9150, loss[loss=0.06345, simple_loss=0.09636, pruned_loss=0.007916, audio_tagging_loss=0.007356, over 15674.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09041, pruned_loss=0.01243, audio_tagging_loss=0.008505, over 3057394.16 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:19:19,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3427806.6666666665, ans=0.125 2023-11-26 14:19:39,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3427940.0, ans=0.0 2023-11-26 14:19:46,764 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514200 2023-11-26 14:19:50,129 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9200, loss[loss=0.06317, simple_loss=0.0838, pruned_loss=0.01162, audio_tagging_loss=0.009652, over 16143.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08965, pruned_loss=0.01224, audio_tagging_loss=0.00859, over 3049637.73 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:19:55,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-26 14:19:58,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.729e+01 9.387e+01 1.004e+02 1.309e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 14:20:29,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=12.0 2023-11-26 14:20:42,105 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2023-11-26 14:20:42,644 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514250 2023-11-26 14:20:45,771 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9250, loss[loss=0.06837, simple_loss=0.08862, pruned_loss=0.01551, audio_tagging_loss=0.008546, over 15354.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09019, pruned_loss=0.01249, audio_tagging_loss=0.008546, over 3047512.75 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:20:51,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3428340.0, ans=0.125 2023-11-26 14:21:14,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-26 14:21:18,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3428540.0, ans=0.125 2023-11-26 14:21:38,504 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514300 2023-11-26 14:21:42,680 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9300, loss[loss=0.07508, simple_loss=0.1048, pruned_loss=0.01326, audio_tagging_loss=0.009397, over 14773.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0897, pruned_loss=0.01248, audio_tagging_loss=0.008615, over 3046420.38 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:21:51,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.838e+01 9.271e+01 1.020e+02 1.264e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 14:21:53,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3428740.0, ans=0.125 2023-11-26 14:22:02,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3428740.0, ans=0.125 2023-11-26 14:22:21,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-26 14:22:23,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-11-26 14:22:29,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3428940.0, ans=0.0 2023-11-26 14:22:32,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2023-11-26 14:22:35,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514350 2023-11-26 14:22:38,798 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9350, loss[loss=0.05229, simple_loss=0.07011, pruned_loss=0.008375, audio_tagging_loss=0.008859, over 16257.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08972, pruned_loss=0.01249, audio_tagging_loss=0.008614, over 3054380.43 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:22:48,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3429073.3333333335, ans=0.125 2023-11-26 14:22:55,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3429073.3333333335, ans=0.05 2023-11-26 14:23:07,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3429140.0, ans=0.2 2023-11-26 14:23:18,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3429206.6666666665, ans=0.125 2023-11-26 14:23:27,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3429273.3333333335, ans=0.2 2023-11-26 14:23:30,919 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514400 2023-11-26 14:23:33,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3429340.0, ans=0.125 2023-11-26 14:23:34,339 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9400, loss[loss=0.07034, simple_loss=0.1002, pruned_loss=0.01217, audio_tagging_loss=0.008079, over 15094.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08944, pruned_loss=0.01243, audio_tagging_loss=0.008765, over 3053091.76 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:23:44,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.705e+01 9.718e+01 1.044e+02 1.326e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-26 14:23:51,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3429406.6666666665, ans=0.125 2023-11-26 14:23:54,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3429406.6666666665, ans=0.1 2023-11-26 14:23:57,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2023-11-26 14:24:07,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3429540.0, ans=0.05 2023-11-26 14:24:23,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3429606.6666666665, ans=0.2 2023-11-26 14:24:26,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3429606.6666666665, ans=0.125 2023-11-26 14:24:27,195 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514450 2023-11-26 14:24:30,859 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9450, loss[loss=0.08909, simple_loss=0.1153, pruned_loss=0.02198, audio_tagging_loss=0.009463, over 15154.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09031, pruned_loss=0.0126, audio_tagging_loss=0.008819, over 3053590.53 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:24:30,903 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:24:40,138 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:24:55,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3429806.6666666665, ans=0.0 2023-11-26 14:24:56,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3429806.6666666665, ans=0.125 2023-11-26 14:24:58,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3429806.6666666665, ans=0.1 2023-11-26 14:25:03,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3429873.3333333335, ans=0.0 2023-11-26 14:25:09,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3429873.3333333335, ans=0.0 2023-11-26 14:25:15,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3429940.0, ans=0.1 2023-11-26 14:25:23,933 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514500 2023-11-26 14:25:27,575 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9500, loss[loss=0.06591, simple_loss=0.09146, pruned_loss=0.01038, audio_tagging_loss=0.009801, over 15249.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.0909, pruned_loss=0.01272, audio_tagging_loss=0.008791, over 3058290.02 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:25:37,195 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.962e+01 9.623e+01 1.023e+02 1.293e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 14:25:58,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3430140.0, ans=0.0 2023-11-26 14:26:19,716 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514550 2023-11-26 14:26:22,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3430340.0, ans=0.0 2023-11-26 14:26:22,796 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9550, loss[loss=0.08174, simple_loss=0.1055, pruned_loss=0.02129, audio_tagging_loss=0.007687, over 14419.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09098, pruned_loss=0.01279, audio_tagging_loss=0.00893, over 3050587.25 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:26:30,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3430340.0, ans=0.125 2023-11-26 14:26:30,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3430340.0, ans=0.125 2023-11-26 14:26:36,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3430406.6666666665, ans=0.1 2023-11-26 14:27:03,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3430540.0, ans=0.125 2023-11-26 14:27:09,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3430606.6666666665, ans=0.0 2023-11-26 14:27:14,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3430606.6666666665, ans=0.05 2023-11-26 14:27:14,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3430606.6666666665, ans=0.95 2023-11-26 14:27:15,668 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514600 2023-11-26 14:27:19,025 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9600, loss[loss=0.06444, simple_loss=0.09197, pruned_loss=0.01127, audio_tagging_loss=0.007184, over 15029.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09022, pruned_loss=0.01258, audio_tagging_loss=0.008968, over 3048133.64 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:27:25,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3430673.3333333335, ans=0.07 2023-11-26 14:27:29,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.862e+01 9.478e+01 1.011e+02 2.091e+02, threshold=1.896e+02, percent-clipped=1.0 2023-11-26 14:28:09,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3430940.0, ans=0.0 2023-11-26 14:28:12,593 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514650 2023-11-26 14:28:15,752 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9650, loss[loss=0.07624, simple_loss=0.1081, pruned_loss=0.01586, audio_tagging_loss=0.00634, over 15008.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09077, pruned_loss=0.01269, audio_tagging_loss=0.00893, over 3048676.08 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:28:23,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3431006.6666666665, ans=0.0 2023-11-26 14:28:40,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3431140.0, ans=0.0 2023-11-26 14:28:52,346 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-11-26 14:29:01,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-26 14:29:02,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3431273.3333333335, ans=0.1 2023-11-26 14:29:08,401 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514700 2023-11-26 14:29:11,519 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9700, loss[loss=0.08777, simple_loss=0.1233, pruned_loss=0.01804, audio_tagging_loss=0.008107, over 15828.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09104, pruned_loss=0.01271, audio_tagging_loss=0.008668, over 3053965.22 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:29:21,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.878e+01 9.480e+01 1.018e+02 1.289e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 14:29:23,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3431406.6666666665, ans=0.125 2023-11-26 14:29:35,127 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.22 vs. limit=10.0 2023-11-26 14:29:41,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-26 14:29:47,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2023-11-26 14:30:04,735 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514750 2023-11-26 14:30:07,848 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9750, loss[loss=0.03988, simple_loss=0.05299, pruned_loss=0.003945, audio_tagging_loss=0.009441, over 14696.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09, pruned_loss=0.01229, audio_tagging_loss=0.008618, over 3051781.65 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:30:20,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3431740.0, ans=0.0 2023-11-26 14:30:28,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3431740.0, ans=0.035 2023-11-26 14:30:42,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-26 14:30:50,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3431873.3333333335, ans=0.0 2023-11-26 14:31:01,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514800 2023-11-26 14:31:04,623 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9800, loss[loss=0.05709, simple_loss=0.07352, pruned_loss=0.01117, audio_tagging_loss=0.009165, over 15670.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0904, pruned_loss=0.01228, audio_tagging_loss=0.008621, over 3052251.21 frames. ], batch size: 65, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:31:04,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3432006.6666666665, ans=0.09899494936611666 2023-11-26 14:31:10,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3432006.6666666665, ans=0.07 2023-11-26 14:31:14,167 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.979e+01 9.504e+01 1.025e+02 1.204e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 14:31:26,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3432140.0, ans=0.1 2023-11-26 14:31:27,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3432140.0, ans=0.125 2023-11-26 14:31:31,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2023-11-26 14:31:33,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-26 14:31:56,028 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:31:57,145 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514850 2023-11-26 14:32:00,294 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9850, loss[loss=0.0878, simple_loss=0.1256, pruned_loss=0.01774, audio_tagging_loss=0.00725, over 15661.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09107, pruned_loss=0.01242, audio_tagging_loss=0.008548, over 3054085.45 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:32:05,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3432340.0, ans=0.125 2023-11-26 14:32:13,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-11-26 14:32:24,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3432473.3333333335, ans=0.125 2023-11-26 14:32:27,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2023-11-26 14:32:37,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-11-26 14:32:46,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-26 14:32:46,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2023-11-26 14:32:52,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-26 14:32:53,113 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514900 2023-11-26 14:32:56,778 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9900, loss[loss=0.05673, simple_loss=0.07374, pruned_loss=0.007884, audio_tagging_loss=0.01198, over 15128.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.0912, pruned_loss=0.01245, audio_tagging_loss=0.008563, over 3049772.74 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:33:05,437 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-26 14:33:07,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.992e+01 8.539e+01 9.208e+01 1.007e+02 1.176e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 14:33:32,299 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.27 vs. limit=10.0 2023-11-26 14:33:38,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-26 14:33:50,429 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514950 2023-11-26 14:33:52,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3433006.6666666665, ans=0.125 2023-11-26 14:33:53,512 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9950, loss[loss=0.06597, simple_loss=0.09428, pruned_loss=0.01062, audio_tagging_loss=0.008208, over 15496.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09109, pruned_loss=0.01247, audio_tagging_loss=0.00848, over 3047334.36 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:34:25,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3433206.6666666665, ans=0.0 2023-11-26 14:34:29,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3433206.6666666665, ans=0.05 2023-11-26 14:34:29,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3433206.6666666665, ans=0.1 2023-11-26 14:34:31,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3433206.6666666665, ans=0.1 2023-11-26 14:34:40,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3433273.3333333335, ans=0.125 2023-11-26 14:34:45,636 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515000 2023-11-26 14:34:49,094 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10000, loss[loss=0.07208, simple_loss=0.1037, pruned_loss=0.01478, audio_tagging_loss=0.005454, over 15568.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09037, pruned_loss=0.01243, audio_tagging_loss=0.00851, over 3051335.80 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:34:59,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.778e+01 9.390e+01 1.020e+02 2.265e+02, threshold=1.878e+02, percent-clipped=1.0 2023-11-26 14:35:15,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3433473.3333333335, ans=0.125 2023-11-26 14:35:31,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3433540.0, ans=0.0 2023-11-26 14:35:32,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3433606.6666666665, ans=0.125 2023-11-26 14:35:34,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3433606.6666666665, ans=0.1 2023-11-26 14:35:41,072 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515050 2023-11-26 14:35:45,415 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10050, loss[loss=0.05951, simple_loss=0.0822, pruned_loss=0.01053, audio_tagging_loss=0.007883, over 15147.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08971, pruned_loss=0.01218, audio_tagging_loss=0.00849, over 3049867.03 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:35:48,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3433673.3333333335, ans=0.125 2023-11-26 14:36:01,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3433740.0, ans=0.1 2023-11-26 14:36:28,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3433940.0, ans=0.09899494936611666 2023-11-26 14:36:33,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3433940.0, ans=0.125 2023-11-26 14:36:37,453 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515100 2023-11-26 14:36:41,184 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10100, loss[loss=0.0473, simple_loss=0.06062, pruned_loss=0.007533, audio_tagging_loss=0.009452, over 15603.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08979, pruned_loss=0.01211, audio_tagging_loss=0.008614, over 3049656.12 frames. ], batch size: 62, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:36:51,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.526e+01 9.238e+01 9.912e+01 1.286e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 14:36:52,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3434073.3333333335, ans=0.09899494936611666 2023-11-26 14:37:11,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3434140.0, ans=0.125 2023-11-26 14:37:14,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3434206.6666666665, ans=0.125 2023-11-26 14:37:18,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-26 14:37:23,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2023-11-26 14:37:28,163 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:37:30,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3434273.3333333335, ans=0.1 2023-11-26 14:37:33,540 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515150 2023-11-26 14:37:36,672 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10150, loss[loss=0.07376, simple_loss=0.09772, pruned_loss=0.0184, audio_tagging_loss=0.006498, over 15278.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08962, pruned_loss=0.01219, audio_tagging_loss=0.008621, over 3046310.42 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:37:52,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3434406.6666666665, ans=0.125 2023-11-26 14:37:52,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3434406.6666666665, ans=0.1 2023-11-26 14:37:58,197 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:38:00,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3434473.3333333335, ans=0.0 2023-11-26 14:38:05,426 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:38:20,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.24 vs. limit=10.0 2023-11-26 14:38:28,845 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515200 2023-11-26 14:38:29,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3434606.6666666665, ans=0.125 2023-11-26 14:38:32,191 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10200, loss[loss=0.06727, simple_loss=0.09084, pruned_loss=0.0151, audio_tagging_loss=0.006753, over 15521.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08997, pruned_loss=0.01222, audio_tagging_loss=0.00863, over 3051605.34 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:38:41,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2023-11-26 14:38:44,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 8.939e+01 9.563e+01 1.037e+02 1.347e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 14:38:55,876 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:39:06,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3434873.3333333335, ans=0.2 2023-11-26 14:39:24,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.10 vs. limit=15.0 2023-11-26 14:39:26,316 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515250 2023-11-26 14:39:29,455 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10250, loss[loss=0.07153, simple_loss=0.08418, pruned_loss=0.01484, audio_tagging_loss=0.0146, over 14483.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09018, pruned_loss=0.01229, audio_tagging_loss=0.008762, over 3052633.80 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:39:32,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3435006.6666666665, ans=0.1 2023-11-26 14:39:34,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3435006.6666666665, ans=0.125 2023-11-26 14:40:22,400 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515300 2023-11-26 14:40:25,491 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10300, loss[loss=0.04976, simple_loss=0.0667, pruned_loss=0.00732, audio_tagging_loss=0.009089, over 14835.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08995, pruned_loss=0.01229, audio_tagging_loss=0.008815, over 3052336.45 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:40:36,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.928e+01 8.849e+01 9.518e+01 1.017e+02 1.480e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 14:40:37,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3435406.6666666665, ans=0.0 2023-11-26 14:40:38,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3435406.6666666665, ans=0.0 2023-11-26 14:41:03,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.25 vs. limit=15.0 2023-11-26 14:41:07,783 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=22.5 2023-11-26 14:41:18,047 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515350 2023-11-26 14:41:21,157 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10350, loss[loss=0.06263, simple_loss=0.08302, pruned_loss=0.01273, audio_tagging_loss=0.008386, over 15326.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08934, pruned_loss=0.01226, audio_tagging_loss=0.008956, over 3050342.87 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:41:23,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3435673.3333333335, ans=0.125 2023-11-26 14:41:27,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3435673.3333333335, ans=0.125 2023-11-26 14:41:40,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3435740.0, ans=0.125 2023-11-26 14:41:45,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3435806.6666666665, ans=0.0 2023-11-26 14:41:47,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3435806.6666666665, ans=0.125 2023-11-26 14:41:53,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3435873.3333333335, ans=0.125 2023-11-26 14:42:02,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3435873.3333333335, ans=0.2 2023-11-26 14:42:13,254 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515400 2023-11-26 14:42:13,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3435940.0, ans=0.125 2023-11-26 14:42:17,226 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10400, loss[loss=0.07643, simple_loss=0.1066, pruned_loss=0.01379, audio_tagging_loss=0.009356, over 15437.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08833, pruned_loss=0.01215, audio_tagging_loss=0.009089, over 3050413.36 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:42:29,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.791e+01 9.464e+01 1.006e+02 1.363e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 14:42:31,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2023-11-26 14:42:39,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3436140.0, ans=0.0 2023-11-26 14:42:58,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3436206.6666666665, ans=0.0 2023-11-26 14:43:00,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3436206.6666666665, ans=0.125 2023-11-26 14:43:10,155 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515450 2023-11-26 14:43:12,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3436340.0, ans=0.125 2023-11-26 14:43:13,301 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10450, loss[loss=0.0696, simple_loss=0.09613, pruned_loss=0.01262, audio_tagging_loss=0.008913, over 14583.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08795, pruned_loss=0.01207, audio_tagging_loss=0.00911, over 3046479.47 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:43:35,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3436473.3333333335, ans=0.2 2023-11-26 14:43:35,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3436473.3333333335, ans=0.125 2023-11-26 14:43:39,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3436473.3333333335, ans=0.5 2023-11-26 14:43:59,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3436606.6666666665, ans=0.2 2023-11-26 14:44:05,408 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515500 2023-11-26 14:44:08,650 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10500, loss[loss=0.04323, simple_loss=0.05872, pruned_loss=0.005664, audio_tagging_loss=0.008201, over 15221.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08902, pruned_loss=0.01221, audio_tagging_loss=0.008882, over 3047504.75 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:44:20,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.625e+01 9.527e+01 1.023e+02 1.211e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 14:44:26,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3436740.0, ans=0.125 2023-11-26 14:44:32,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3436806.6666666665, ans=0.2 2023-11-26 14:44:54,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3436940.0, ans=0.125 2023-11-26 14:45:01,579 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515550 2023-11-26 14:45:03,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3437006.6666666665, ans=0.0 2023-11-26 14:45:04,716 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10550, loss[loss=0.05276, simple_loss=0.07513, pruned_loss=0.008272, audio_tagging_loss=0.00692, over 15263.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08907, pruned_loss=0.01211, audio_tagging_loss=0.008802, over 3044465.63 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:45:24,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3437073.3333333335, ans=0.1 2023-11-26 14:45:26,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2023-11-26 14:45:37,984 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=12.0 2023-11-26 14:45:58,769 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515600 2023-11-26 14:46:01,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3437340.0, ans=0.07 2023-11-26 14:46:02,167 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10600, loss[loss=0.06426, simple_loss=0.09664, pruned_loss=0.01077, audio_tagging_loss=0.005169, over 14665.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09028, pruned_loss=0.01237, audio_tagging_loss=0.008649, over 3041490.62 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:46:08,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3437340.0, ans=0.125 2023-11-26 14:46:14,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.031e+01 9.613e+01 1.032e+02 1.237e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 14:46:19,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3437406.6666666665, ans=0.1 2023-11-26 14:46:22,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3437473.3333333335, ans=0.0 2023-11-26 14:46:27,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2023-11-26 14:46:35,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3437540.0, ans=0.2 2023-11-26 14:46:37,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3437540.0, ans=0.125 2023-11-26 14:46:54,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515650 2023-11-26 14:46:57,553 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10650, loss[loss=0.07825, simple_loss=0.1097, pruned_loss=0.01498, audio_tagging_loss=0.008417, over 16109.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09023, pruned_loss=0.01236, audio_tagging_loss=0.008583, over 3045317.26 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:47:07,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3437740.0, ans=0.125 2023-11-26 14:47:14,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3437740.0, ans=0.1 2023-11-26 14:47:15,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3437740.0, ans=0.125 2023-11-26 14:47:25,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3437806.6666666665, ans=0.125 2023-11-26 14:47:40,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3437873.3333333335, ans=0.125 2023-11-26 14:47:50,324 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515700 2023-11-26 14:47:50,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3437940.0, ans=0.125 2023-11-26 14:47:53,414 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10700, loss[loss=0.06304, simple_loss=0.08932, pruned_loss=0.01148, audio_tagging_loss=0.006903, over 15572.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08911, pruned_loss=0.01217, audio_tagging_loss=0.008672, over 3041222.91 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:47:58,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3438006.6666666665, ans=0.05 2023-11-26 14:48:07,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.817e+01 9.509e+01 1.036e+02 1.497e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 14:48:15,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3438140.0, ans=0.0 2023-11-26 14:48:22,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3438140.0, ans=0.125 2023-11-26 14:48:27,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3438206.6666666665, ans=0.125 2023-11-26 14:48:31,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3438206.6666666665, ans=0.125 2023-11-26 14:48:34,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3438206.6666666665, ans=0.125 2023-11-26 14:48:38,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2023-11-26 14:48:46,719 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515750 2023-11-26 14:48:49,890 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10750, loss[loss=0.06028, simple_loss=0.08057, pruned_loss=0.009167, audio_tagging_loss=0.01083, over 14752.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0897, pruned_loss=0.01229, audio_tagging_loss=0.008642, over 3044297.59 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:49:39,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3438606.6666666665, ans=0.0 2023-11-26 14:49:42,675 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515800 2023-11-26 14:49:44,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2023-11-26 14:49:46,060 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10800, loss[loss=0.08589, simple_loss=0.1212, pruned_loss=0.01649, audio_tagging_loss=0.00879, over 16760.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09041, pruned_loss=0.01243, audio_tagging_loss=0.008565, over 3052177.91 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:49:47,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3438673.3333333335, ans=0.125 2023-11-26 14:49:51,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3438673.3333333335, ans=0.5 2023-11-26 14:49:58,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3438740.0, ans=0.0 2023-11-26 14:49:59,614 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.847e+01 9.608e+01 1.038e+02 1.531e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 14:50:04,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3438740.0, ans=0.125 2023-11-26 14:50:06,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3438740.0, ans=0.0 2023-11-26 14:50:18,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3438806.6666666665, ans=0.125 2023-11-26 14:50:27,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3438873.3333333335, ans=0.0 2023-11-26 14:50:34,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.82 vs. limit=6.0 2023-11-26 14:50:38,851 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515850 2023-11-26 14:50:40,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-26 14:50:42,497 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10850, loss[loss=0.05197, simple_loss=0.0722, pruned_loss=0.006901, audio_tagging_loss=0.008964, over 14461.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08937, pruned_loss=0.01229, audio_tagging_loss=0.008684, over 3046749.02 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:50:43,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3439006.6666666665, ans=0.05 2023-11-26 14:50:47,396 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-26 14:50:56,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2023-11-26 14:51:01,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2023-11-26 14:51:05,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3439140.0, ans=0.125 2023-11-26 14:51:12,182 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=22.5 2023-11-26 14:51:12,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3439140.0, ans=0.2 2023-11-26 14:51:16,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3439206.6666666665, ans=0.0 2023-11-26 14:51:35,726 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515900 2023-11-26 14:51:36,714 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:51:38,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=22.5 2023-11-26 14:51:38,900 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10900, loss[loss=0.06901, simple_loss=0.1008, pruned_loss=0.01117, audio_tagging_loss=0.007439, over 15511.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08884, pruned_loss=0.01215, audio_tagging_loss=0.008767, over 3043827.79 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:51:52,170 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 9.005e+01 9.638e+01 1.044e+02 1.421e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 14:52:12,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3439540.0, ans=0.125 2023-11-26 14:52:18,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3439540.0, ans=0.125 2023-11-26 14:52:25,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3439606.6666666665, ans=15.0 2023-11-26 14:52:26,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3439606.6666666665, ans=0.125 2023-11-26 14:52:31,167 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515950 2023-11-26 14:52:34,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3439673.3333333335, ans=0.125 2023-11-26 14:52:34,980 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10950, loss[loss=0.07704, simple_loss=0.1035, pruned_loss=0.01594, audio_tagging_loss=0.009331, over 15850.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08943, pruned_loss=0.01215, audio_tagging_loss=0.008841, over 3047831.68 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:52:40,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3439673.3333333335, ans=0.125 2023-11-26 14:52:43,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3439673.3333333335, ans=0.2 2023-11-26 14:53:02,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3439806.6666666665, ans=0.0 2023-11-26 14:53:02,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-26 14:53:05,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3439806.6666666665, ans=0.125 2023-11-26 14:53:14,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3439873.3333333335, ans=0.125 2023-11-26 14:53:22,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3439940.0, ans=0.2 2023-11-26 14:53:24,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439940.0, ans=0.1 2023-11-26 14:53:27,459 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516000 2023-11-26 14:53:32,853 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11000, loss[loss=0.08818, simple_loss=0.1258, pruned_loss=0.01987, audio_tagging_loss=0.005426, over 15102.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08939, pruned_loss=0.01218, audio_tagging_loss=0.008757, over 3049587.08 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:53:43,509 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:53:44,351 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:53:47,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.791e+01 9.278e+01 1.005e+02 1.404e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 14:53:59,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3440140.0, ans=0.125 2023-11-26 14:54:10,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-11-26 14:54:12,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.72 vs. limit=10.0 2023-11-26 14:54:18,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3440273.3333333335, ans=0.125 2023-11-26 14:54:25,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516050 2023-11-26 14:54:29,441 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11050, loss[loss=0.0633, simple_loss=0.08329, pruned_loss=0.01099, audio_tagging_loss=0.01067, over 15086.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09015, pruned_loss=0.01243, audio_tagging_loss=0.008814, over 3047500.96 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:54:30,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3440340.0, ans=0.0 2023-11-26 14:54:43,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.87 vs. limit=10.0 2023-11-26 14:55:01,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3440540.0, ans=0.0 2023-11-26 14:55:21,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516100 2023-11-26 14:55:21,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3440606.6666666665, ans=0.1 2023-11-26 14:55:24,784 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11100, loss[loss=0.05499, simple_loss=0.0686, pruned_loss=0.006926, audio_tagging_loss=0.01376, over 14892.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09028, pruned_loss=0.01246, audio_tagging_loss=0.008901, over 3048165.48 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:55:35,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3440740.0, ans=0.125 2023-11-26 14:55:38,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.786e+01 9.291e+01 1.014e+02 1.274e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 14:55:40,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3440740.0, ans=0.125 2023-11-26 14:55:55,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2023-11-26 14:55:57,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-11-26 14:56:00,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3440873.3333333335, ans=0.125 2023-11-26 14:56:02,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3440873.3333333335, ans=0.125 2023-11-26 14:56:17,663 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516150 2023-11-26 14:56:17,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3440940.0, ans=0.0 2023-11-26 14:56:20,719 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11150, loss[loss=0.06475, simple_loss=0.0936, pruned_loss=0.009506, audio_tagging_loss=0.008442, over 16307.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09002, pruned_loss=0.01243, audio_tagging_loss=0.008988, over 3053470.45 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:56:22,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3441006.6666666665, ans=0.1 2023-11-26 14:56:49,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3441140.0, ans=0.125 2023-11-26 14:56:58,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2023-11-26 14:57:00,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3441206.6666666665, ans=0.1 2023-11-26 14:57:02,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=8.0 2023-11-26 14:57:09,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3441273.3333333335, ans=10.0 2023-11-26 14:57:13,681 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516200 2023-11-26 14:57:18,223 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11200, loss[loss=0.05903, simple_loss=0.08214, pruned_loss=0.008666, audio_tagging_loss=0.009296, over 13432.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08882, pruned_loss=0.01214, audio_tagging_loss=0.009074, over 3052288.08 frames. ], batch size: 50, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:57:30,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.740e+01 9.384e+01 1.028e+02 1.331e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 14:57:42,426 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2023-11-26 14:57:43,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3441473.3333333335, ans=0.125 2023-11-26 14:57:44,462 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:57:53,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2023-11-26 14:58:01,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-11-26 14:58:03,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3441606.6666666665, ans=0.125 2023-11-26 14:58:10,347 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516250 2023-11-26 14:58:12,624 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:58:12,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3441673.3333333335, ans=0.125 2023-11-26 14:58:13,521 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11250, loss[loss=0.05757, simple_loss=0.0775, pruned_loss=0.01194, audio_tagging_loss=0.006888, over 17376.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08852, pruned_loss=0.01217, audio_tagging_loss=0.009099, over 3058063.12 frames. ], batch size: 65, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:58:22,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3441673.3333333335, ans=0.125 2023-11-26 14:58:24,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3441740.0, ans=0.09899494936611666 2023-11-26 14:58:28,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3441740.0, ans=0.1 2023-11-26 14:58:48,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3441873.3333333335, ans=0.125 2023-11-26 14:58:49,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3441873.3333333335, ans=0.125 2023-11-26 14:58:54,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2023-11-26 14:59:02,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3441940.0, ans=0.125 2023-11-26 14:59:05,980 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516300 2023-11-26 14:59:09,213 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11300, loss[loss=0.04348, simple_loss=0.0573, pruned_loss=0.005309, audio_tagging_loss=0.009521, over 15779.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08898, pruned_loss=0.0122, audio_tagging_loss=0.008922, over 3056286.64 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:59:14,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3442006.6666666665, ans=0.125 2023-11-26 14:59:14,774 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2023-11-26 14:59:24,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.246e+01 8.684e+01 9.357e+01 1.017e+02 1.284e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 14:59:31,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3442140.0, ans=0.125 2023-11-26 14:59:32,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3442140.0, ans=0.0 2023-11-26 14:59:40,143 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:00:02,085 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516350 2023-11-26 15:00:05,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-26 15:00:05,801 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11350, loss[loss=0.05033, simple_loss=0.06627, pruned_loss=0.007396, audio_tagging_loss=0.009798, over 14408.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09028, pruned_loss=0.01234, audio_tagging_loss=0.008831, over 3053646.00 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:00:21,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3442406.6666666665, ans=0.0 2023-11-26 15:00:33,604 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-26 15:00:34,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3442473.3333333335, ans=0.0 2023-11-26 15:00:58,405 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516400 2023-11-26 15:01:01,797 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11400, loss[loss=0.08442, simple_loss=0.1185, pruned_loss=0.01738, audio_tagging_loss=0.007766, over 16828.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09009, pruned_loss=0.01231, audio_tagging_loss=0.008806, over 3048754.90 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:01:09,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3442673.3333333335, ans=0.125 2023-11-26 15:01:15,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.777e+01 9.213e+01 1.005e+02 1.277e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 15:01:26,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3442806.6666666665, ans=0.95 2023-11-26 15:01:29,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2023-11-26 15:01:46,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3442940.0, ans=10.0 2023-11-26 15:01:49,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3442940.0, ans=0.125 2023-11-26 15:01:53,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516450 2023-11-26 15:01:57,081 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11450, loss[loss=0.05757, simple_loss=0.08637, pruned_loss=0.009119, audio_tagging_loss=0.005263, over 14576.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09003, pruned_loss=0.01248, audio_tagging_loss=0.008736, over 3050648.54 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:02:14,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2023-11-26 15:02:49,880 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516500 2023-11-26 15:02:49,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3443273.3333333335, ans=0.5 2023-11-26 15:02:53,540 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11500, loss[loss=0.066, simple_loss=0.093, pruned_loss=0.01277, audio_tagging_loss=0.006726, over 15179.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08937, pruned_loss=0.01237, audio_tagging_loss=0.008797, over 3051447.84 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:02:57,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=12.0 2023-11-26 15:03:02,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3443340.0, ans=0.125 2023-11-26 15:03:08,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.894e+01 9.338e+01 1.016e+02 1.234e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 15:03:10,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3443406.6666666665, ans=0.125 2023-11-26 15:03:28,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3443540.0, ans=0.0 2023-11-26 15:03:36,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3443540.0, ans=0.125 2023-11-26 15:03:42,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3443606.6666666665, ans=0.0 2023-11-26 15:03:46,765 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516550 2023-11-26 15:03:49,909 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11550, loss[loss=0.04892, simple_loss=0.06812, pruned_loss=0.006638, audio_tagging_loss=0.008219, over 15051.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08933, pruned_loss=0.01233, audio_tagging_loss=0.0087, over 3047418.03 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:03:55,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3443673.3333333335, ans=0.125 2023-11-26 15:04:01,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3443740.0, ans=0.1 2023-11-26 15:04:22,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3443873.3333333335, ans=0.125 2023-11-26 15:04:25,131 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:04:32,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3443873.3333333335, ans=0.0 2023-11-26 15:04:42,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516600 2023-11-26 15:04:45,628 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11600, loss[loss=0.04774, simple_loss=0.06145, pruned_loss=0.005906, audio_tagging_loss=0.0111, over 13499.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08876, pruned_loss=0.01211, audio_tagging_loss=0.008654, over 3044025.68 frames. ], batch size: 52, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 15:04:46,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3444006.6666666665, ans=0.1 2023-11-26 15:04:54,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3444006.6666666665, ans=0.0 2023-11-26 15:04:59,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3444073.3333333335, ans=0.0 2023-11-26 15:05:00,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.905e+01 9.507e+01 1.006e+02 1.398e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 15:05:29,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3444273.3333333335, ans=0.0 2023-11-26 15:05:32,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3444273.3333333335, ans=0.2 2023-11-26 15:05:37,853 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516650 2023-11-26 15:05:41,443 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11650, loss[loss=0.05835, simple_loss=0.08401, pruned_loss=0.009964, audio_tagging_loss=0.006377, over 15769.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08912, pruned_loss=0.01213, audio_tagging_loss=0.008654, over 3043198.23 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:06:05,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3444473.3333333335, ans=0.1 2023-11-26 15:06:23,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3444540.0, ans=0.5 2023-11-26 15:06:34,835 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516700 2023-11-26 15:06:37,974 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11700, loss[loss=0.05159, simple_loss=0.07191, pruned_loss=0.005763, audio_tagging_loss=0.009873, over 15770.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08898, pruned_loss=0.0121, audio_tagging_loss=0.008616, over 3047620.74 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:06:47,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3444740.0, ans=0.2 2023-11-26 15:06:52,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 8.754e+01 9.292e+01 9.879e+01 2.063e+02, threshold=1.858e+02, percent-clipped=1.0 2023-11-26 15:07:09,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3444873.3333333335, ans=0.0 2023-11-26 15:07:23,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=28.32 vs. limit=22.5 2023-11-26 15:07:29,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516750 2023-11-26 15:07:32,643 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11750, loss[loss=0.06426, simple_loss=0.08936, pruned_loss=0.01148, audio_tagging_loss=0.008101, over 14678.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08896, pruned_loss=0.01211, audio_tagging_loss=0.008621, over 3049031.23 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:07:43,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3445073.3333333335, ans=0.0 2023-11-26 15:07:47,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-26 15:07:51,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3445073.3333333335, ans=0.125 2023-11-26 15:07:55,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3445140.0, ans=0.5 2023-11-26 15:07:58,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3445140.0, ans=0.0 2023-11-26 15:08:02,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445140.0, ans=0.1 2023-11-26 15:08:24,952 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516800 2023-11-26 15:08:28,243 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11800, loss[loss=0.08609, simple_loss=0.1159, pruned_loss=0.01937, audio_tagging_loss=0.008774, over 15363.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08833, pruned_loss=0.01204, audio_tagging_loss=0.008733, over 3051917.89 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:08:28,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3445340.0, ans=0.0 2023-11-26 15:08:30,140 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:08:44,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3445406.6666666665, ans=0.07 2023-11-26 15:08:45,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.777e+01 9.316e+01 1.001e+02 1.366e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 15:08:46,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3445406.6666666665, ans=0.125 2023-11-26 15:08:55,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3445473.3333333335, ans=0.0 2023-11-26 15:08:57,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3445473.3333333335, ans=0.125 2023-11-26 15:09:00,885 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2023-11-26 15:09:03,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3445540.0, ans=0.5 2023-11-26 15:09:09,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-26 15:09:20,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3445606.6666666665, ans=0.125 2023-11-26 15:09:22,276 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516850 2023-11-26 15:09:25,434 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11850, loss[loss=0.06038, simple_loss=0.07886, pruned_loss=0.01066, audio_tagging_loss=0.01029, over 15740.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08892, pruned_loss=0.01226, audio_tagging_loss=0.008841, over 3057132.47 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:09:45,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3445740.0, ans=0.125 2023-11-26 15:10:03,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3445873.3333333335, ans=0.07 2023-11-26 15:10:17,986 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516900 2023-11-26 15:10:21,072 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11900, loss[loss=0.05178, simple_loss=0.06214, pruned_loss=0.01008, audio_tagging_loss=0.01063, over 14111.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0889, pruned_loss=0.01211, audio_tagging_loss=0.008929, over 3051936.36 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:10:33,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3446073.3333333335, ans=0.125 2023-11-26 15:10:35,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.570e+01 9.365e+01 9.968e+01 1.257e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 15:10:52,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2023-11-26 15:10:55,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3446206.6666666665, ans=0.0 2023-11-26 15:11:07,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3446273.3333333335, ans=0.125 2023-11-26 15:11:12,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3446273.3333333335, ans=0.2 2023-11-26 15:11:13,056 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516950 2023-11-26 15:11:16,148 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11950, loss[loss=0.06648, simple_loss=0.09187, pruned_loss=0.01146, audio_tagging_loss=0.009087, over 15538.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08802, pruned_loss=0.01193, audio_tagging_loss=0.009066, over 3056761.05 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:11:16,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3446340.0, ans=0.0 2023-11-26 15:11:23,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3446340.0, ans=0.0 2023-11-26 15:11:24,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2023-11-26 15:11:27,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-26 15:11:50,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3446540.0, ans=0.125 2023-11-26 15:11:59,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3446606.6666666665, ans=0.015 2023-11-26 15:12:06,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3446606.6666666665, ans=0.125 2023-11-26 15:12:07,702 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517000 2023-11-26 15:12:11,025 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 12000, loss[loss=0.07458, simple_loss=0.103, pruned_loss=0.01459, audio_tagging_loss=0.008495, over 15195.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08859, pruned_loss=0.01214, audio_tagging_loss=0.00907, over 3056490.61 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 15:12:11,025 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 15:12:30,869 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6155, 3.6636, 3.9423, 3.5395], device='cuda:2') 2023-11-26 15:12:43,909 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.05829, simple_loss=0.05056, pruned_loss=0.00528, audio_tagging_loss=0.02773, over 4681554.00 frames. 2023-11-26 15:12:43,909 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 15:12:58,687 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 8.906e+01 9.562e+01 1.016e+02 1.213e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 15:13:38,089 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 0, loss[loss=0.08695, simple_loss=0.1083, pruned_loss=0.01454, audio_tagging_loss=0.01829, over 15479.00 frames. ], tot_loss[loss=0.08695, simple_loss=0.1083, pruned_loss=0.01454, audio_tagging_loss=0.01829, over 15479.00 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:13:38,090 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 15:14:01,262 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3157, 4.2859, 4.4880, 4.4788], device='cuda:2') 2023-11-26 15:14:09,404 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05821, simple_loss=0.05063, pruned_loss=0.005319, audio_tagging_loss=0.02758, over 4681554.00 frames. 2023-11-26 15:14:09,404 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 15:14:14,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3446840.0, ans=0.0 2023-11-26 15:14:14,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3446840.0, ans=0.0 2023-11-26 15:14:34,456 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517050 2023-11-26 15:14:34,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3446973.3333333335, ans=0.125 2023-11-26 15:14:51,435 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.73 vs. limit=10.0 2023-11-26 15:14:54,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447106.6666666665, ans=0.1 2023-11-26 15:14:57,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3447106.6666666665, ans=0.0 2023-11-26 15:14:58,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=22.5 2023-11-26 15:14:59,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3447106.6666666665, ans=0.125 2023-11-26 15:15:05,084 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 50, loss[loss=0.08031, simple_loss=0.09694, pruned_loss=0.01503, audio_tagging_loss=0.01681, over 16386.00 frames. ], tot_loss[loss=0.07523, simple_loss=0.09058, pruned_loss=0.01299, audio_tagging_loss=0.01695, over 687564.80 frames. ], batch size: 63, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:15:05,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447173.3333333335, ans=0.1 2023-11-26 15:15:26,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3447240.0, ans=0.0 2023-11-26 15:15:30,263 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517100 2023-11-26 15:15:33,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3447306.6666666665, ans=0.0 2023-11-26 15:15:46,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3447373.3333333335, ans=0.125 2023-11-26 15:15:47,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3447373.3333333335, ans=15.0 2023-11-26 15:15:48,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.397e+01 9.647e+01 1.037e+02 1.149e+02 1.439e+02, threshold=2.073e+02, percent-clipped=0.0 2023-11-26 15:15:57,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3447440.0, ans=0.125 2023-11-26 15:16:01,679 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 100, loss[loss=0.06726, simple_loss=0.08066, pruned_loss=0.009569, audio_tagging_loss=0.01736, over 14449.00 frames. ], tot_loss[loss=0.07335, simple_loss=0.08917, pruned_loss=0.01257, audio_tagging_loss=0.0162, over 1208666.88 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:16:05,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447506.6666666665, ans=0.1 2023-11-26 15:16:13,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.71 vs. limit=5.0 2023-11-26 15:16:21,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3447573.3333333335, ans=0.125 2023-11-26 15:16:26,512 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517150 2023-11-26 15:16:30,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3447640.0, ans=0.125 2023-11-26 15:16:35,268 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:16:39,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447706.6666666665, ans=0.1 2023-11-26 15:16:40,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3447706.6666666665, ans=0.2 2023-11-26 15:16:47,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=22.5 2023-11-26 15:16:54,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=22.5 2023-11-26 15:16:58,402 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 150, loss[loss=0.07496, simple_loss=0.09241, pruned_loss=0.01519, audio_tagging_loss=0.01356, over 15688.00 frames. ], tot_loss[loss=0.07182, simple_loss=0.08991, pruned_loss=0.01247, audio_tagging_loss=0.0144, over 1617131.72 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:17:07,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2023-11-26 15:17:23,541 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517200 2023-11-26 15:17:36,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3448040.0, ans=0.125 2023-11-26 15:17:39,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2023-11-26 15:17:43,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-26 15:17:43,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.130e+01 9.675e+01 1.049e+02 1.216e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 15:17:47,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3448106.6666666665, ans=0.1 2023-11-26 15:17:54,480 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 200, loss[loss=0.08663, simple_loss=0.1231, pruned_loss=0.01754, audio_tagging_loss=0.007513, over 15700.00 frames. ], tot_loss[loss=0.07077, simple_loss=0.09083, pruned_loss=0.01255, audio_tagging_loss=0.01281, over 1927987.80 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:18:12,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3448240.0, ans=0.0 2023-11-26 15:18:19,042 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517250 2023-11-26 15:18:24,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3448306.6666666665, ans=0.2 2023-11-26 15:18:36,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448373.3333333335, ans=0.1 2023-11-26 15:18:36,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-26 15:18:51,358 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 250, loss[loss=0.06921, simple_loss=0.08934, pruned_loss=0.01511, audio_tagging_loss=0.009427, over 14978.00 frames. ], tot_loss[loss=0.06922, simple_loss=0.09067, pruned_loss=0.01239, audio_tagging_loss=0.0115, over 2175704.00 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:19:13,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448640.0, ans=0.1 2023-11-26 15:19:15,426 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517300 2023-11-26 15:19:19,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3448640.0, ans=0.05 2023-11-26 15:19:36,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.917e+01 9.750e+01 1.047e+02 1.492e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 15:19:41,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3448773.3333333335, ans=0.125 2023-11-26 15:19:46,943 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 300, loss[loss=0.07846, simple_loss=0.1116, pruned_loss=0.01531, audio_tagging_loss=0.007363, over 14774.00 frames. ], tot_loss[loss=0.06871, simple_loss=0.09109, pruned_loss=0.0125, audio_tagging_loss=0.01066, over 2367690.54 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:19:52,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3448840.0, ans=0.2 2023-11-26 15:19:59,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3448906.6666666665, ans=0.2 2023-11-26 15:20:01,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.46 vs. limit=15.0 2023-11-26 15:20:12,209 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517350 2023-11-26 15:20:15,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3448973.3333333335, ans=0.1 2023-11-26 15:20:40,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3449106.6666666665, ans=0.125 2023-11-26 15:20:43,513 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 350, loss[loss=0.04515, simple_loss=0.05506, pruned_loss=0.007084, audio_tagging_loss=0.01053, over 16804.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.0904, pruned_loss=0.01231, audio_tagging_loss=0.01016, over 2515818.88 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:20:49,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3449173.3333333335, ans=0.125 2023-11-26 15:20:55,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3449240.0, ans=0.125 2023-11-26 15:20:56,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3449240.0, ans=0.2 2023-11-26 15:21:01,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=12.0 2023-11-26 15:21:07,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517400 2023-11-26 15:21:18,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3449373.3333333335, ans=0.2 2023-11-26 15:21:23,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3449373.3333333335, ans=0.125 2023-11-26 15:21:28,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.037e+01 9.510e+01 1.047e+02 1.188e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 15:21:28,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3449440.0, ans=0.1 2023-11-26 15:21:37,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3449440.0, ans=0.0 2023-11-26 15:21:40,202 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 400, loss[loss=0.07122, simple_loss=0.09919, pruned_loss=0.009647, audio_tagging_loss=0.01198, over 15865.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.08978, pruned_loss=0.01219, audio_tagging_loss=0.009863, over 2635747.56 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:21:40,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-26 15:21:44,794 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:22:03,996 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517450 2023-11-26 15:22:13,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-26 15:22:23,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3449773.3333333335, ans=0.125 2023-11-26 15:22:26,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3449773.3333333335, ans=0.0 2023-11-26 15:22:33,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3449773.3333333335, ans=0.0 2023-11-26 15:22:35,244 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 450, loss[loss=0.05923, simple_loss=0.08156, pruned_loss=0.00981, audio_tagging_loss=0.008639, over 15129.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08947, pruned_loss=0.01222, audio_tagging_loss=0.009612, over 2721964.56 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:23:00,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517500 2023-11-26 15:23:00,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-26 15:23:20,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.956e+01 9.480e+01 1.000e+02 1.239e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 15:23:31,858 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 500, loss[loss=0.07905, simple_loss=0.1134, pruned_loss=0.01504, audio_tagging_loss=0.007287, over 14710.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08928, pruned_loss=0.01225, audio_tagging_loss=0.009402, over 2796515.11 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:23:34,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3450173.3333333335, ans=0.1 2023-11-26 15:23:47,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3450240.0, ans=0.0 2023-11-26 15:23:48,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3450240.0, ans=0.125 2023-11-26 15:23:49,651 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:23:57,010 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517550 2023-11-26 15:24:01,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3450306.6666666665, ans=0.0 2023-11-26 15:24:06,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3450373.3333333335, ans=0.0 2023-11-26 15:24:28,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3450506.6666666665, ans=0.2 2023-11-26 15:24:28,925 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 550, loss[loss=0.05664, simple_loss=0.06715, pruned_loss=0.01465, audio_tagging_loss=0.008416, over 14619.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0902, pruned_loss=0.01232, audio_tagging_loss=0.009158, over 2854584.04 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:24:52,346 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517600 2023-11-26 15:24:53,505 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:25:14,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.780e+01 9.554e+01 1.038e+02 1.321e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 15:25:14,767 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:25:24,060 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 600, loss[loss=0.04606, simple_loss=0.05972, pruned_loss=0.005908, audio_tagging_loss=0.01029, over 15146.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08945, pruned_loss=0.01228, audio_tagging_loss=0.009159, over 2894316.24 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:25:34,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3450906.6666666665, ans=0.0 2023-11-26 15:25:36,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3450906.6666666665, ans=0.0 2023-11-26 15:25:48,507 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517650 2023-11-26 15:25:56,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3450973.3333333335, ans=0.1 2023-11-26 15:26:09,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3451106.6666666665, ans=0.0 2023-11-26 15:26:11,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3451106.6666666665, ans=0.125 2023-11-26 15:26:14,128 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:26:14,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3451106.6666666665, ans=0.025 2023-11-26 15:26:15,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3451106.6666666665, ans=0.125 2023-11-26 15:26:19,385 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 650, loss[loss=0.06459, simple_loss=0.08966, pruned_loss=0.01193, audio_tagging_loss=0.007826, over 14819.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09023, pruned_loss=0.01243, audio_tagging_loss=0.009111, over 2930403.62 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:26:27,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3451173.3333333335, ans=0.1 2023-11-26 15:26:36,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3451240.0, ans=0.125 2023-11-26 15:26:45,075 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517700 2023-11-26 15:26:49,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3451306.6666666665, ans=0.125 2023-11-26 15:26:54,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3451373.3333333335, ans=0.2 2023-11-26 15:27:03,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2023-11-26 15:27:06,894 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.698e+01 9.351e+01 9.946e+01 1.223e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 15:27:11,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3451440.0, ans=0.1 2023-11-26 15:27:15,787 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 700, loss[loss=0.07153, simple_loss=0.09614, pruned_loss=0.015, audio_tagging_loss=0.008465, over 15757.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09038, pruned_loss=0.01247, audio_tagging_loss=0.009012, over 2957972.02 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:27:23,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-11-26 15:27:32,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-26 15:27:40,476 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517750 2023-11-26 15:27:57,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3451706.6666666665, ans=0.125 2023-11-26 15:28:01,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3451773.3333333335, ans=0.0 2023-11-26 15:28:12,465 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 750, loss[loss=0.07162, simple_loss=0.09949, pruned_loss=0.0117, audio_tagging_loss=0.01017, over 15210.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09045, pruned_loss=0.01253, audio_tagging_loss=0.009004, over 2984553.83 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:28:27,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=15.0 2023-11-26 15:28:34,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3451973.3333333335, ans=0.125 2023-11-26 15:28:36,631 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517800 2023-11-26 15:28:45,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2023-11-26 15:28:59,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.944e+01 9.681e+01 1.076e+02 1.736e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-26 15:29:04,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3452106.6666666665, ans=0.0 2023-11-26 15:29:05,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3452106.6666666665, ans=0.2 2023-11-26 15:29:08,441 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 800, loss[loss=0.08021, simple_loss=0.1086, pruned_loss=0.01673, audio_tagging_loss=0.009184, over 15915.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09072, pruned_loss=0.01256, audio_tagging_loss=0.00898, over 2995906.43 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:29:18,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3452240.0, ans=0.0 2023-11-26 15:29:24,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-11-26 15:29:27,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3452240.0, ans=0.0 2023-11-26 15:29:28,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3452240.0, ans=0.125 2023-11-26 15:29:34,028 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517850 2023-11-26 15:29:36,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3452306.6666666665, ans=0.07 2023-11-26 15:29:36,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=12.0 2023-11-26 15:29:46,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2023-11-26 15:29:58,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3452440.0, ans=0.125 2023-11-26 15:30:03,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3452506.6666666665, ans=0.125 2023-11-26 15:30:04,063 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 850, loss[loss=0.05885, simple_loss=0.07783, pruned_loss=0.01111, audio_tagging_loss=0.008821, over 14883.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08986, pruned_loss=0.01249, audio_tagging_loss=0.008999, over 3006677.63 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:30:29,242 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517900 2023-11-26 15:30:41,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3452706.6666666665, ans=0.0 2023-11-26 15:30:52,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.827e+01 9.589e+01 1.017e+02 1.364e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 15:31:00,598 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 900, loss[loss=0.06281, simple_loss=0.08959, pruned_loss=0.00876, audio_tagging_loss=0.009251, over 15254.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08931, pruned_loss=0.01241, audio_tagging_loss=0.009066, over 3014285.27 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:31:05,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-26 15:31:13,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2023-11-26 15:31:24,585 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517950 2023-11-26 15:31:24,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3452973.3333333335, ans=0.0 2023-11-26 15:31:40,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=12.0 2023-11-26 15:31:41,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3453040.0, ans=0.0 2023-11-26 15:31:54,755 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 950, loss[loss=0.05675, simple_loss=0.07441, pruned_loss=0.009156, audio_tagging_loss=0.01039, over 15350.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08968, pruned_loss=0.01238, audio_tagging_loss=0.009049, over 3027523.92 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:32:02,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2023-11-26 15:32:06,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3453240.0, ans=0.1 2023-11-26 15:32:08,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3453240.0, ans=0.125 2023-11-26 15:32:20,101 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518000 2023-11-26 15:32:28,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3453373.3333333335, ans=0.2 2023-11-26 15:32:29,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3453373.3333333335, ans=0.0 2023-11-26 15:32:36,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3453373.3333333335, ans=0.2 2023-11-26 15:32:41,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.765e+01 9.471e+01 9.957e+01 1.208e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 15:32:50,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3453506.6666666665, ans=0.2 2023-11-26 15:32:50,869 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1000, loss[loss=0.07777, simple_loss=0.09954, pruned_loss=0.01646, audio_tagging_loss=0.01153, over 14997.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09025, pruned_loss=0.01261, audio_tagging_loss=0.008864, over 3036254.23 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:32:57,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=22.5 2023-11-26 15:33:11,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3453573.3333333335, ans=0.125 2023-11-26 15:33:14,257 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:33:15,331 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518050 2023-11-26 15:33:31,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3453706.6666666665, ans=0.125 2023-11-26 15:33:33,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3453706.6666666665, ans=0.125 2023-11-26 15:33:35,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-26 15:33:46,940 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1050, loss[loss=0.07532, simple_loss=0.1051, pruned_loss=0.01567, audio_tagging_loss=0.007119, over 15222.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08995, pruned_loss=0.01262, audio_tagging_loss=0.00869, over 3034828.05 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:33:49,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3453840.0, ans=0.125 2023-11-26 15:33:56,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3453906.6666666665, ans=0.125 2023-11-26 15:33:56,829 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:33:57,830 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:34:02,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3453906.6666666665, ans=0.0 2023-11-26 15:34:11,136 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518100 2023-11-26 15:34:33,435 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.861e+01 9.465e+01 1.011e+02 1.415e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 15:34:35,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.95 vs. limit=22.5 2023-11-26 15:34:42,001 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1100, loss[loss=0.06598, simple_loss=0.08738, pruned_loss=0.01384, audio_tagging_loss=0.008453, over 15274.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08941, pruned_loss=0.01247, audio_tagging_loss=0.008687, over 3038139.39 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:34:45,230 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:34:47,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3454173.3333333335, ans=0.1 2023-11-26 15:34:51,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3454240.0, ans=0.0 2023-11-26 15:34:55,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3454240.0, ans=0.0 2023-11-26 15:35:04,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-11-26 15:35:06,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518150 2023-11-26 15:35:11,622 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:35:13,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-26 15:35:28,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3454440.0, ans=0.2 2023-11-26 15:35:37,332 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1150, loss[loss=0.06428, simple_loss=0.08796, pruned_loss=0.01043, audio_tagging_loss=0.009878, over 15333.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08897, pruned_loss=0.01247, audio_tagging_loss=0.008673, over 3038159.45 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:35:46,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3454506.6666666665, ans=0.1 2023-11-26 15:36:01,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518200 2023-11-26 15:36:10,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3454706.6666666665, ans=0.125 2023-11-26 15:36:12,954 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:36:22,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3454773.3333333335, ans=0.2 2023-11-26 15:36:24,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.766e+01 9.295e+01 9.999e+01 1.209e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 15:36:28,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3454773.3333333335, ans=0.0 2023-11-26 15:36:28,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3454773.3333333335, ans=0.1 2023-11-26 15:36:32,909 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1200, loss[loss=0.06722, simple_loss=0.09299, pruned_loss=0.01111, audio_tagging_loss=0.009607, over 16064.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08936, pruned_loss=0.01246, audio_tagging_loss=0.008655, over 3033387.37 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:36:34,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3454840.0, ans=0.125 2023-11-26 15:36:57,058 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518250 2023-11-26 15:37:04,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-26 15:37:08,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3455040.0, ans=0.05 2023-11-26 15:37:27,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3455173.3333333335, ans=0.05 2023-11-26 15:37:28,273 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1250, loss[loss=0.07151, simple_loss=0.1021, pruned_loss=0.01382, audio_tagging_loss=0.006619, over 15941.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08946, pruned_loss=0.01237, audio_tagging_loss=0.008647, over 3035162.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:37:52,865 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518300 2023-11-26 15:37:54,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3455306.6666666665, ans=0.125 2023-11-26 15:37:57,253 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:38:00,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2023-11-26 15:38:16,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.858e+01 9.436e+01 1.015e+02 1.276e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 15:38:20,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3455440.0, ans=0.0 2023-11-26 15:38:23,790 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1300, loss[loss=0.04022, simple_loss=0.04231, pruned_loss=0.007176, audio_tagging_loss=0.01189, over 14322.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08864, pruned_loss=0.0122, audio_tagging_loss=0.008655, over 3034892.56 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:38:33,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3455573.3333333335, ans=0.125 2023-11-26 15:38:36,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3455573.3333333335, ans=0.125 2023-11-26 15:38:48,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518350 2023-11-26 15:39:01,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455706.6666666665, ans=0.1 2023-11-26 15:39:06,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3455706.6666666665, ans=0.0 2023-11-26 15:39:19,313 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1350, loss[loss=0.05632, simple_loss=0.07292, pruned_loss=0.009562, audio_tagging_loss=0.0103, over 15903.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08832, pruned_loss=0.01224, audio_tagging_loss=0.008675, over 3034991.41 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:39:34,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.60 vs. limit=6.0 2023-11-26 15:39:43,428 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518400 2023-11-26 15:39:43,639 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:40:00,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3456040.0, ans=0.0 2023-11-26 15:40:01,173 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:40:08,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.757e+01 9.290e+01 1.016e+02 1.312e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 15:40:15,038 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1400, loss[loss=0.07083, simple_loss=0.09544, pruned_loss=0.01179, audio_tagging_loss=0.01133, over 14884.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08802, pruned_loss=0.01197, audio_tagging_loss=0.008696, over 3035295.75 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:40:19,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3456173.3333333335, ans=0.0 2023-11-26 15:40:22,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3456173.3333333335, ans=0.0 2023-11-26 15:40:39,869 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518450 2023-11-26 15:40:47,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3456306.6666666665, ans=0.2 2023-11-26 15:41:10,960 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1450, loss[loss=0.05226, simple_loss=0.07007, pruned_loss=0.009585, audio_tagging_loss=0.007645, over 14763.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08801, pruned_loss=0.01183, audio_tagging_loss=0.008769, over 3035646.12 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:41:11,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.17 vs. limit=15.0 2023-11-26 15:41:16,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3456506.6666666665, ans=0.125 2023-11-26 15:41:22,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-26 15:41:27,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3456573.3333333335, ans=0.125 2023-11-26 15:41:28,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3456573.3333333335, ans=0.1 2023-11-26 15:41:31,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3456573.3333333335, ans=0.0 2023-11-26 15:41:36,261 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518500 2023-11-26 15:41:49,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3456706.6666666665, ans=0.125 2023-11-26 15:42:00,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.759e+01 9.010e+01 9.704e+01 1.035e+02 1.675e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-26 15:42:08,023 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1500, loss[loss=0.05623, simple_loss=0.07543, pruned_loss=0.009837, audio_tagging_loss=0.008678, over 14645.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08941, pruned_loss=0.01215, audio_tagging_loss=0.008765, over 3038466.56 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:42:15,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3456840.0, ans=0.125 2023-11-26 15:42:18,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3456906.6666666665, ans=0.125 2023-11-26 15:42:24,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3456906.6666666665, ans=0.07 2023-11-26 15:42:28,426 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-26 15:42:32,704 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518550 2023-11-26 15:42:54,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-11-26 15:43:03,423 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1550, loss[loss=0.09279, simple_loss=0.1252, pruned_loss=0.02122, audio_tagging_loss=0.008983, over 14680.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08962, pruned_loss=0.01228, audio_tagging_loss=0.008774, over 3037989.25 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:43:09,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3457173.3333333335, ans=0.125 2023-11-26 15:43:27,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3457306.6666666665, ans=0.125 2023-11-26 15:43:27,895 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518600 2023-11-26 15:43:30,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3457306.6666666665, ans=0.125 2023-11-26 15:43:33,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3457306.6666666665, ans=0.0 2023-11-26 15:43:34,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3457306.6666666665, ans=0.125 2023-11-26 15:43:37,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2023-11-26 15:43:52,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.861e+01 9.494e+01 1.024e+02 1.186e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 15:43:53,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3457440.0, ans=10.0 2023-11-26 15:43:58,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-26 15:43:59,619 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1600, loss[loss=0.06745, simple_loss=0.09249, pruned_loss=0.01354, audio_tagging_loss=0.00767, over 15507.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08949, pruned_loss=0.01235, audio_tagging_loss=0.008907, over 3035050.27 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:44:01,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3457506.6666666665, ans=0.0 2023-11-26 15:44:08,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3457506.6666666665, ans=0.0 2023-11-26 15:44:11,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3457573.3333333335, ans=0.125 2023-11-26 15:44:24,891 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518650 2023-11-26 15:44:45,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3457773.3333333335, ans=0.125 2023-11-26 15:44:52,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3457773.3333333335, ans=0.2 2023-11-26 15:44:55,868 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1650, loss[loss=0.07045, simple_loss=0.09429, pruned_loss=0.01216, audio_tagging_loss=0.01115, over 15749.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08947, pruned_loss=0.01231, audio_tagging_loss=0.008937, over 3038811.07 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:44:58,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3457840.0, ans=0.5 2023-11-26 15:45:00,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-26 15:45:15,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=22.5 2023-11-26 15:45:20,418 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518700 2023-11-26 15:45:25,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3457973.3333333335, ans=0.0 2023-11-26 15:45:32,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3458040.0, ans=0.04949747468305833 2023-11-26 15:45:35,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3458040.0, ans=0.125 2023-11-26 15:45:41,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3458106.6666666665, ans=0.2 2023-11-26 15:45:42,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3458106.6666666665, ans=0.0 2023-11-26 15:45:45,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 9.023e+01 9.377e+01 1.009e+02 1.256e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 15:45:45,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3458106.6666666665, ans=0.125 2023-11-26 15:45:52,303 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1700, loss[loss=0.07372, simple_loss=0.1029, pruned_loss=0.01475, audio_tagging_loss=0.007499, over 15834.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08905, pruned_loss=0.01226, audio_tagging_loss=0.009021, over 3035891.77 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:46:05,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3458240.0, ans=0.125 2023-11-26 15:46:16,685 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518750 2023-11-26 15:46:21,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2023-11-26 15:46:39,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2023-11-26 15:46:47,672 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1750, loss[loss=0.06217, simple_loss=0.07272, pruned_loss=0.0146, audio_tagging_loss=0.01121, over 14213.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.0893, pruned_loss=0.01214, audio_tagging_loss=0.009015, over 3040101.42 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:46:52,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=12.0 2023-11-26 15:47:13,180 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518800 2023-11-26 15:47:13,682 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2023-11-26 15:47:17,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=12.0 2023-11-26 15:47:18,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-26 15:47:18,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3458640.0, ans=10.0 2023-11-26 15:47:28,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3458706.6666666665, ans=0.125 2023-11-26 15:47:29,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3458706.6666666665, ans=0.0 2023-11-26 15:47:38,075 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.660e+01 9.290e+01 1.019e+02 1.190e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 15:47:44,455 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1800, loss[loss=0.06645, simple_loss=0.08446, pruned_loss=0.01642, audio_tagging_loss=0.007801, over 13314.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08908, pruned_loss=0.01204, audio_tagging_loss=0.00893, over 3038605.93 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:47:49,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3458840.0, ans=0.04949747468305833 2023-11-26 15:47:54,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3458840.0, ans=0.1 2023-11-26 15:48:06,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3458973.3333333335, ans=0.2 2023-11-26 15:48:08,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518850 2023-11-26 15:48:11,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3458973.3333333335, ans=0.125 2023-11-26 15:48:15,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3458973.3333333335, ans=0.125 2023-11-26 15:48:33,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-11-26 15:48:40,874 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1850, loss[loss=0.07911, simple_loss=0.1105, pruned_loss=0.01646, audio_tagging_loss=0.007403, over 15094.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08861, pruned_loss=0.01202, audio_tagging_loss=0.008932, over 3040589.49 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:49:05,179 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518900 2023-11-26 15:49:08,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3459306.6666666665, ans=0.125 2023-11-26 15:49:30,364 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.036e+01 8.755e+01 9.422e+01 1.025e+02 1.230e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 15:49:36,704 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1900, loss[loss=0.08674, simple_loss=0.1163, pruned_loss=0.0215, audio_tagging_loss=0.00708, over 15125.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08944, pruned_loss=0.01217, audio_tagging_loss=0.008775, over 3051081.37 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:49:47,527 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-11-26 15:50:02,394 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518950 2023-11-26 15:50:12,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3459706.6666666665, ans=0.125 2023-11-26 15:50:33,048 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1950, loss[loss=0.06097, simple_loss=0.0783, pruned_loss=0.01226, audio_tagging_loss=0.009562, over 15139.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08914, pruned_loss=0.01205, audio_tagging_loss=0.008764, over 3037665.14 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:50:34,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3459840.0, ans=0.1 2023-11-26 15:50:51,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3459906.6666666665, ans=0.125 2023-11-26 15:50:52,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-11-26 15:50:53,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3459906.6666666665, ans=0.0 2023-11-26 15:50:58,244 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519000 2023-11-26 15:51:17,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460106.6666666665, ans=0.1 2023-11-26 15:51:18,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3460106.6666666665, ans=0.125 2023-11-26 15:51:23,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.668e+01 9.341e+01 1.000e+02 1.329e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 15:51:24,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-11-26 15:51:30,362 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2000, loss[loss=0.07749, simple_loss=0.1024, pruned_loss=0.01777, audio_tagging_loss=0.008543, over 14549.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08912, pruned_loss=0.01197, audio_tagging_loss=0.008767, over 3035772.57 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:51:31,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3460173.3333333335, ans=0.125 2023-11-26 15:51:45,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3460240.0, ans=0.05 2023-11-26 15:51:54,383 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519050 2023-11-26 15:52:09,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3460373.3333333335, ans=0.0 2023-11-26 15:52:18,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3460440.0, ans=0.125 2023-11-26 15:52:19,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460440.0, ans=0.1 2023-11-26 15:52:20,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3460440.0, ans=0.125 2023-11-26 15:52:24,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3460440.0, ans=0.0 2023-11-26 15:52:26,022 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2050, loss[loss=0.05806, simple_loss=0.07614, pruned_loss=0.01089, audio_tagging_loss=0.009102, over 14085.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08952, pruned_loss=0.01216, audio_tagging_loss=0.008787, over 3032689.74 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:52:42,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460573.3333333335, ans=0.1 2023-11-26 15:52:51,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519100 2023-11-26 15:52:54,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=22.5 2023-11-26 15:53:00,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3460706.6666666665, ans=0.0 2023-11-26 15:53:03,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3460706.6666666665, ans=0.2 2023-11-26 15:53:13,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3460773.3333333335, ans=0.0 2023-11-26 15:53:15,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.697e+01 9.387e+01 1.014e+02 2.680e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-26 15:53:21,952 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2100, loss[loss=0.06535, simple_loss=0.0902, pruned_loss=0.01145, audio_tagging_loss=0.008792, over 14682.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08987, pruned_loss=0.01216, audio_tagging_loss=0.008738, over 3033839.39 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:53:35,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3460906.6666666665, ans=0.0 2023-11-26 15:53:37,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3460906.6666666665, ans=0.0 2023-11-26 15:53:46,619 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519150 2023-11-26 15:53:54,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-11-26 15:54:14,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3461106.6666666665, ans=0.0 2023-11-26 15:54:18,664 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2150, loss[loss=0.05137, simple_loss=0.06391, pruned_loss=0.00926, audio_tagging_loss=0.01015, over 15326.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09102, pruned_loss=0.01236, audio_tagging_loss=0.008703, over 3036838.01 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:54:26,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3461173.3333333335, ans=0.125 2023-11-26 15:54:28,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461240.0, ans=0.1 2023-11-26 15:54:30,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3461240.0, ans=0.04949747468305833 2023-11-26 15:54:41,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-26 15:54:42,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519200 2023-11-26 15:54:52,764 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:54:59,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3461373.3333333335, ans=0.0 2023-11-26 15:55:07,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 9.113e+01 9.715e+01 1.044e+02 1.389e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 15:55:11,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3461440.0, ans=0.0 2023-11-26 15:55:13,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3461506.6666666665, ans=0.125 2023-11-26 15:55:14,123 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2200, loss[loss=0.06044, simple_loss=0.08592, pruned_loss=0.01094, audio_tagging_loss=0.006538, over 15532.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09106, pruned_loss=0.01254, audio_tagging_loss=0.008736, over 3035542.64 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:55:19,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461506.6666666665, ans=0.1 2023-11-26 15:55:30,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461573.3333333335, ans=0.1 2023-11-26 15:55:39,020 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519250 2023-11-26 15:55:45,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.07 vs. limit=10.0 2023-11-26 15:55:48,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3461706.6666666665, ans=0.2 2023-11-26 15:55:59,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3461773.3333333335, ans=0.0 2023-11-26 15:56:03,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3461773.3333333335, ans=0.125 2023-11-26 15:56:07,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3461773.3333333335, ans=0.125 2023-11-26 15:56:10,445 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2250, loss[loss=0.04591, simple_loss=0.06134, pruned_loss=0.006259, audio_tagging_loss=0.008978, over 17166.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09079, pruned_loss=0.01245, audio_tagging_loss=0.008709, over 3044773.84 frames. ], batch size: 64, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:56:17,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3461840.0, ans=0.1 2023-11-26 15:56:19,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3461840.0, ans=0.125 2023-11-26 15:56:31,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461906.6666666665, ans=0.1 2023-11-26 15:56:35,495 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519300 2023-11-26 15:56:37,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3461973.3333333335, ans=0.0 2023-11-26 15:56:42,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3461973.3333333335, ans=0.125 2023-11-26 15:57:00,480 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.882e+01 9.360e+01 1.008e+02 1.473e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 15:57:06,964 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2300, loss[loss=0.07239, simple_loss=0.1013, pruned_loss=0.01463, audio_tagging_loss=0.007094, over 16080.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09032, pruned_loss=0.01227, audio_tagging_loss=0.00879, over 3044025.84 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:57:30,738 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519350 2023-11-26 15:57:30,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3462306.6666666665, ans=0.0 2023-11-26 15:57:31,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3462306.6666666665, ans=0.0 2023-11-26 15:57:35,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3462306.6666666665, ans=0.2 2023-11-26 15:57:50,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3462440.0, ans=0.1 2023-11-26 15:57:50,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3462440.0, ans=0.125 2023-11-26 15:57:50,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2023-11-26 15:57:56,353 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:58:00,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3462440.0, ans=0.125 2023-11-26 15:58:02,745 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2350, loss[loss=0.07133, simple_loss=0.1018, pruned_loss=0.01262, audio_tagging_loss=0.00783, over 15917.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09002, pruned_loss=0.01215, audio_tagging_loss=0.008814, over 3050951.01 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:58:10,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3462506.6666666665, ans=0.0 2023-11-26 15:58:18,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-26 15:58:23,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3462640.0, ans=0.2 2023-11-26 15:58:25,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3462640.0, ans=0.0 2023-11-26 15:58:26,779 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519400 2023-11-26 15:58:32,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3462640.0, ans=0.0 2023-11-26 15:58:44,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2023-11-26 15:58:52,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.835e+01 9.579e+01 1.049e+02 1.967e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-26 15:58:58,916 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2400, loss[loss=0.06487, simple_loss=0.0857, pruned_loss=0.01078, audio_tagging_loss=0.01124, over 15268.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09028, pruned_loss=0.01225, audio_tagging_loss=0.008901, over 3049665.70 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:59:01,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3462840.0, ans=0.125 2023-11-26 15:59:02,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3462840.0, ans=0.0 2023-11-26 15:59:17,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3462906.6666666665, ans=0.125 2023-11-26 15:59:18,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-26 15:59:22,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3462973.3333333335, ans=0.0 2023-11-26 15:59:23,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2023-11-26 15:59:24,276 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519450 2023-11-26 15:59:27,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3462973.3333333335, ans=10.0 2023-11-26 15:59:36,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3463040.0, ans=0.125 2023-11-26 15:59:40,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3463040.0, ans=0.035 2023-11-26 15:59:54,849 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2450, loss[loss=0.05921, simple_loss=0.07668, pruned_loss=0.01047, audio_tagging_loss=0.0104, over 15533.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09094, pruned_loss=0.01233, audio_tagging_loss=0.008988, over 3058563.05 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:00:16,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3463240.0, ans=0.0 2023-11-26 16:00:19,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3463306.6666666665, ans=0.125 2023-11-26 16:00:20,168 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519500 2023-11-26 16:00:28,764 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-26 16:00:35,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3463373.3333333335, ans=0.125 2023-11-26 16:00:41,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3463440.0, ans=15.0 2023-11-26 16:00:46,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.797e+01 8.970e+01 9.604e+01 1.012e+02 1.518e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-26 16:00:52,116 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2500, loss[loss=0.07292, simple_loss=0.09731, pruned_loss=0.0158, audio_tagging_loss=0.00846, over 16128.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09031, pruned_loss=0.0123, audio_tagging_loss=0.009044, over 3057869.90 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:00:59,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3463506.6666666665, ans=0.125 2023-11-26 16:01:16,343 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519550 2023-11-26 16:01:17,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-11-26 16:01:20,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=22.5 2023-11-26 16:01:47,776 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2550, loss[loss=0.05293, simple_loss=0.06981, pruned_loss=0.009748, audio_tagging_loss=0.008277, over 15129.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08924, pruned_loss=0.01217, audio_tagging_loss=0.008964, over 3052935.03 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:01:53,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3463840.0, ans=0.125 2023-11-26 16:02:05,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3463906.6666666665, ans=0.125 2023-11-26 16:02:08,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3463906.6666666665, ans=0.0 2023-11-26 16:02:12,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519600 2023-11-26 16:02:15,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-26 16:02:37,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.57 vs. limit=10.0 2023-11-26 16:02:40,009 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.652e+01 9.307e+01 9.985e+01 1.166e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 16:02:43,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3464173.3333333335, ans=0.0 2023-11-26 16:02:44,316 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2600, loss[loss=0.05908, simple_loss=0.08683, pruned_loss=0.009446, audio_tagging_loss=0.006219, over 15915.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08868, pruned_loss=0.0121, audio_tagging_loss=0.008762, over 3044691.66 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:02:47,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3464173.3333333335, ans=0.0 2023-11-26 16:02:52,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3464173.3333333335, ans=0.125 2023-11-26 16:02:55,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3464240.0, ans=0.0 2023-11-26 16:03:09,613 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519650 2023-11-26 16:03:38,199 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-26 16:03:40,915 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2650, loss[loss=0.06376, simple_loss=0.08775, pruned_loss=0.01182, audio_tagging_loss=0.008068, over 14696.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08901, pruned_loss=0.01209, audio_tagging_loss=0.008712, over 3041499.27 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:03:54,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3464573.3333333335, ans=0.125 2023-11-26 16:04:05,218 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519700 2023-11-26 16:04:05,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3464640.0, ans=0.125 2023-11-26 16:04:08,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-26 16:04:08,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2023-11-26 16:04:09,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-26 16:04:09,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3464640.0, ans=0.5 2023-11-26 16:04:32,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.790e+01 9.468e+01 1.013e+02 1.366e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 16:04:36,912 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2700, loss[loss=0.06383, simple_loss=0.08832, pruned_loss=0.01055, audio_tagging_loss=0.009116, over 15651.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08907, pruned_loss=0.01207, audio_tagging_loss=0.008687, over 3044696.75 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:04:37,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3464840.0, ans=0.2 2023-11-26 16:04:39,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3464840.0, ans=0.125 2023-11-26 16:04:51,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3464906.6666666665, ans=0.1 2023-11-26 16:05:02,147 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519750 2023-11-26 16:05:05,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3464973.3333333335, ans=0.125 2023-11-26 16:05:29,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3465106.6666666665, ans=0.0 2023-11-26 16:05:33,065 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2750, loss[loss=0.07997, simple_loss=0.112, pruned_loss=0.01894, audio_tagging_loss=0.005043, over 14865.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08857, pruned_loss=0.01209, audio_tagging_loss=0.008713, over 3049404.61 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:05:51,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3465240.0, ans=10.0 2023-11-26 16:05:52,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3465240.0, ans=0.2 2023-11-26 16:05:57,576 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519800 2023-11-26 16:06:22,655 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:06:25,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.811e+01 9.283e+01 1.006e+02 1.287e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 16:06:26,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3465440.0, ans=0.025 2023-11-26 16:06:29,586 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2800, loss[loss=0.08174, simple_loss=0.117, pruned_loss=0.01762, audio_tagging_loss=0.005629, over 16443.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08847, pruned_loss=0.01203, audio_tagging_loss=0.00863, over 3049458.48 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:06:38,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465506.6666666665, ans=0.1 2023-11-26 16:06:48,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3465573.3333333335, ans=0.125 2023-11-26 16:06:49,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3465573.3333333335, ans=0.0 2023-11-26 16:06:54,080 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519850 2023-11-26 16:07:09,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=22.5 2023-11-26 16:07:11,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3465706.6666666665, ans=0.2 2023-11-26 16:07:24,879 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2850, loss[loss=0.06985, simple_loss=0.09458, pruned_loss=0.01283, audio_tagging_loss=0.009729, over 14832.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08831, pruned_loss=0.01205, audio_tagging_loss=0.00854, over 3047228.42 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:07:30,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3465840.0, ans=0.015 2023-11-26 16:07:50,756 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519900 2023-11-26 16:07:53,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3465973.3333333335, ans=0.125 2023-11-26 16:08:00,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3466040.0, ans=0.125 2023-11-26 16:08:07,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3466040.0, ans=0.1 2023-11-26 16:08:11,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3466106.6666666665, ans=0.125 2023-11-26 16:08:18,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.838e+01 9.404e+01 1.047e+02 1.303e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 16:08:21,801 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2900, loss[loss=0.08285, simple_loss=0.1136, pruned_loss=0.01798, audio_tagging_loss=0.008082, over 15060.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0885, pruned_loss=0.01222, audio_tagging_loss=0.008534, over 3047973.75 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:08:46,470 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519950 2023-11-26 16:09:00,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3466373.3333333335, ans=0.0 2023-11-26 16:09:00,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3466373.3333333335, ans=0.125 2023-11-26 16:09:02,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3466373.3333333335, ans=0.125 2023-11-26 16:09:08,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3466440.0, ans=0.0 2023-11-26 16:09:18,646 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2950, loss[loss=0.0613, simple_loss=0.08353, pruned_loss=0.009477, audio_tagging_loss=0.01006, over 14906.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08887, pruned_loss=0.01216, audio_tagging_loss=0.008652, over 3050667.69 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:09:40,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3466640.0, ans=0.125 2023-11-26 16:09:43,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520000 2023-11-26 16:09:54,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3466706.6666666665, ans=0.125 2023-11-26 16:10:12,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.851e+01 9.554e+01 1.014e+02 1.213e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 16:10:16,213 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3000, loss[loss=0.08735, simple_loss=0.1311, pruned_loss=0.01743, audio_tagging_loss=0.004346, over 15431.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08962, pruned_loss=0.01237, audio_tagging_loss=0.008739, over 3052217.75 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:10:16,213 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 16:10:48,837 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05748, simple_loss=0.05058, pruned_loss=0.005287, audio_tagging_loss=0.02691, over 4681554.00 frames. 2023-11-26 16:10:48,837 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 16:11:13,574 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520050 2023-11-26 16:11:13,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3466973.3333333335, ans=0.0 2023-11-26 16:11:22,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3467040.0, ans=0.2 2023-11-26 16:11:32,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3467106.6666666665, ans=0.0 2023-11-26 16:11:35,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-26 16:11:45,575 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3050, loss[loss=0.07157, simple_loss=0.1029, pruned_loss=0.0137, audio_tagging_loss=0.006428, over 15480.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08898, pruned_loss=0.0122, audio_tagging_loss=0.008831, over 3058535.64 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:11:59,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3467240.0, ans=15.0 2023-11-26 16:12:09,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520100 2023-11-26 16:12:19,134 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:12:24,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3467373.3333333335, ans=0.0 2023-11-26 16:12:37,804 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.992e+01 9.720e+01 1.054e+02 1.278e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-26 16:12:41,168 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3100, loss[loss=0.04733, simple_loss=0.057, pruned_loss=0.006778, audio_tagging_loss=0.01206, over 14325.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08948, pruned_loss=0.01213, audio_tagging_loss=0.008942, over 3054411.25 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:13:03,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3467640.0, ans=0.125 2023-11-26 16:13:06,283 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520150 2023-11-26 16:13:06,827 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.89 vs. limit=22.5 2023-11-26 16:13:09,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3467640.0, ans=0.1 2023-11-26 16:13:23,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-26 16:13:23,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.41 vs. limit=10.0 2023-11-26 16:13:36,584 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3150, loss[loss=0.08077, simple_loss=0.1096, pruned_loss=0.01474, audio_tagging_loss=0.01124, over 14835.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08933, pruned_loss=0.01201, audio_tagging_loss=0.008942, over 3049739.58 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 16:13:39,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3467840.0, ans=0.125 2023-11-26 16:13:41,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3467840.0, ans=0.0 2023-11-26 16:14:01,391 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520200 2023-11-26 16:14:21,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3468106.6666666665, ans=0.125 2023-11-26 16:14:23,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-26 16:14:31,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 8.852e+01 9.512e+01 1.032e+02 1.320e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 16:14:31,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2023-11-26 16:14:33,235 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3200, loss[loss=0.06375, simple_loss=0.08287, pruned_loss=0.01226, audio_tagging_loss=0.01005, over 14656.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08874, pruned_loss=0.01187, audio_tagging_loss=0.009017, over 3044574.75 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:14:33,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3468173.3333333335, ans=0.125 2023-11-26 16:14:56,917 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520250 2023-11-26 16:15:05,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3468373.3333333335, ans=0.09899494936611666 2023-11-26 16:15:28,511 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3250, loss[loss=0.04929, simple_loss=0.06143, pruned_loss=0.007849, audio_tagging_loss=0.01072, over 14719.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08837, pruned_loss=0.01185, audio_tagging_loss=0.009045, over 3038580.77 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:15:29,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3468506.6666666665, ans=0.125 2023-11-26 16:15:31,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3468506.6666666665, ans=0.0 2023-11-26 16:15:42,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3468573.3333333335, ans=0.125 2023-11-26 16:15:46,413 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-11-26 16:15:49,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3468640.0, ans=0.0 2023-11-26 16:15:53,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3468640.0, ans=0.2 2023-11-26 16:15:54,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520300 2023-11-26 16:16:10,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3468706.6666666665, ans=0.1 2023-11-26 16:16:21,811 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.992e+01 9.345e+01 1.022e+02 1.465e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 16:16:23,913 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3300, loss[loss=0.09435, simple_loss=0.1241, pruned_loss=0.02174, audio_tagging_loss=0.01056, over 16488.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08977, pruned_loss=0.01214, audio_tagging_loss=0.009007, over 3048324.39 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:16:34,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3468840.0, ans=0.0 2023-11-26 16:16:49,503 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520350 2023-11-26 16:16:52,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3468973.3333333335, ans=0.0 2023-11-26 16:16:58,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3469040.0, ans=0.0 2023-11-26 16:17:18,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3469106.6666666665, ans=0.125 2023-11-26 16:17:18,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3469106.6666666665, ans=0.125 2023-11-26 16:17:21,074 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3350, loss[loss=0.05057, simple_loss=0.06771, pruned_loss=0.008062, audio_tagging_loss=0.008656, over 15330.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08957, pruned_loss=0.01211, audio_tagging_loss=0.008903, over 3052129.31 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:17:22,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3469173.3333333335, ans=0.2 2023-11-26 16:17:23,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3469173.3333333335, ans=0.04949747468305833 2023-11-26 16:17:33,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3469240.0, ans=0.2 2023-11-26 16:17:36,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3469240.0, ans=0.0 2023-11-26 16:17:44,308 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520400 2023-11-26 16:17:44,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3469306.6666666665, ans=0.0 2023-11-26 16:18:01,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3469373.3333333335, ans=0.125 2023-11-26 16:18:07,851 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.67 vs. limit=6.0 2023-11-26 16:18:13,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.575e+01 9.293e+01 1.025e+02 1.253e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 16:18:15,812 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3400, loss[loss=0.08565, simple_loss=0.1153, pruned_loss=0.01939, audio_tagging_loss=0.008591, over 15723.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08989, pruned_loss=0.01215, audio_tagging_loss=0.008771, over 3060706.23 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:18:22,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3469506.6666666665, ans=0.0 2023-11-26 16:18:40,619 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520450 2023-11-26 16:18:41,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2023-11-26 16:18:49,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3469706.6666666665, ans=0.2 2023-11-26 16:18:55,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-26 16:19:02,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-11-26 16:19:07,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3469773.3333333335, ans=0.0 2023-11-26 16:19:07,379 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-26 16:19:10,988 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3450, loss[loss=0.07412, simple_loss=0.1031, pruned_loss=0.01338, audio_tagging_loss=0.009209, over 15541.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08935, pruned_loss=0.01197, audio_tagging_loss=0.008656, over 3063537.30 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:19:19,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3469840.0, ans=0.125 2023-11-26 16:19:20,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=22.5 2023-11-26 16:19:36,340 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520500 2023-11-26 16:19:44,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3470040.0, ans=0.125 2023-11-26 16:20:05,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.921e+01 9.582e+01 1.025e+02 1.197e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 16:20:07,464 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3500, loss[loss=0.05607, simple_loss=0.07188, pruned_loss=0.01045, audio_tagging_loss=0.00968, over 15169.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09021, pruned_loss=0.01211, audio_tagging_loss=0.008565, over 3057606.43 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:20:17,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3470240.0, ans=0.05 2023-11-26 16:20:23,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3470240.0, ans=0.2 2023-11-26 16:20:26,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=12.0 2023-11-26 16:20:27,931 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2023-11-26 16:20:31,632 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520550 2023-11-26 16:20:35,904 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:20:41,877 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:20:42,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3470373.3333333335, ans=0.125 2023-11-26 16:20:49,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3470373.3333333335, ans=0.035 2023-11-26 16:20:56,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3470440.0, ans=0.2 2023-11-26 16:21:01,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3470440.0, ans=0.125 2023-11-26 16:21:03,076 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3550, loss[loss=0.08012, simple_loss=0.1122, pruned_loss=0.01444, audio_tagging_loss=0.009574, over 15161.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09033, pruned_loss=0.01209, audio_tagging_loss=0.008641, over 3061555.52 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:21:18,790 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.28 vs. limit=22.5 2023-11-26 16:21:27,169 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520600 2023-11-26 16:21:28,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3470640.0, ans=0.125 2023-11-26 16:21:31,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3470640.0, ans=0.125 2023-11-26 16:21:31,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3470640.0, ans=0.125 2023-11-26 16:21:34,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3470640.0, ans=0.125 2023-11-26 16:21:45,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3470706.6666666665, ans=0.125 2023-11-26 16:21:55,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.906e+01 9.475e+01 1.015e+02 1.360e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 16:21:57,994 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3600, loss[loss=0.04197, simple_loss=0.05176, pruned_loss=0.005811, audio_tagging_loss=0.01027, over 14959.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08991, pruned_loss=0.01206, audio_tagging_loss=0.008629, over 3055030.04 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:22:00,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2023-11-26 16:22:21,207 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:22:23,265 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520650 2023-11-26 16:22:24,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3470973.3333333335, ans=0.0 2023-11-26 16:22:29,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3470973.3333333335, ans=0.125 2023-11-26 16:22:54,296 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3650, loss[loss=0.04865, simple_loss=0.07395, pruned_loss=0.005337, audio_tagging_loss=0.006336, over 14395.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08989, pruned_loss=0.01225, audio_tagging_loss=0.00862, over 3051365.93 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:23:00,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3471173.3333333335, ans=0.1 2023-11-26 16:23:10,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3471240.0, ans=0.0 2023-11-26 16:23:13,183 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:23:14,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2023-11-26 16:23:18,317 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520700 2023-11-26 16:23:25,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3471373.3333333335, ans=0.0 2023-11-26 16:23:31,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3471373.3333333335, ans=0.125 2023-11-26 16:23:47,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.866e+01 9.412e+01 1.015e+02 1.534e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 16:23:49,725 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3700, loss[loss=0.05555, simple_loss=0.07961, pruned_loss=0.00709, audio_tagging_loss=0.008659, over 15559.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09082, pruned_loss=0.01232, audio_tagging_loss=0.008614, over 3056868.06 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:24:13,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520750 2023-11-26 16:24:22,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3471706.6666666665, ans=0.0 2023-11-26 16:24:28,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3471706.6666666665, ans=0.0 2023-11-26 16:24:44,722 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3750, loss[loss=0.06568, simple_loss=0.09334, pruned_loss=0.01098, audio_tagging_loss=0.008031, over 15280.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09078, pruned_loss=0.01233, audio_tagging_loss=0.00873, over 3050888.52 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:24:54,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3471906.6666666665, ans=0.125 2023-11-26 16:25:04,713 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-11-26 16:25:08,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3471973.3333333335, ans=0.1 2023-11-26 16:25:09,434 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520800 2023-11-26 16:25:21,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2023-11-26 16:25:23,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3472040.0, ans=0.125 2023-11-26 16:25:24,030 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:25:28,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2023-11-26 16:25:31,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-26 16:25:34,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3472106.6666666665, ans=0.125 2023-11-26 16:25:39,675 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.098e+01 9.695e+01 1.024e+02 1.279e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-26 16:25:39,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3472173.3333333335, ans=0.125 2023-11-26 16:25:40,794 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3800, loss[loss=0.05544, simple_loss=0.07313, pruned_loss=0.01043, audio_tagging_loss=0.008454, over 15016.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09127, pruned_loss=0.01249, audio_tagging_loss=0.008631, over 3052833.71 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:25:41,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3472173.3333333335, ans=0.125 2023-11-26 16:25:55,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3472240.0, ans=0.0 2023-11-26 16:26:00,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3472240.0, ans=0.1 2023-11-26 16:26:05,839 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520850 2023-11-26 16:26:08,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3472306.6666666665, ans=0.07 2023-11-26 16:26:14,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3472373.3333333335, ans=0.0 2023-11-26 16:26:23,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3472373.3333333335, ans=0.125 2023-11-26 16:26:33,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2023-11-26 16:26:35,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3472506.6666666665, ans=0.125 2023-11-26 16:26:36,564 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3850, loss[loss=0.07164, simple_loss=0.08659, pruned_loss=0.01901, audio_tagging_loss=0.009332, over 14860.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09118, pruned_loss=0.01259, audio_tagging_loss=0.008687, over 3056443.38 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:26:52,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3472573.3333333335, ans=0.125 2023-11-26 16:26:59,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3472640.0, ans=0.0 2023-11-26 16:26:59,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-26 16:27:01,203 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520900 2023-11-26 16:27:05,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3472640.0, ans=0.125 2023-11-26 16:27:14,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3472706.6666666665, ans=0.0 2023-11-26 16:27:30,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3472773.3333333335, ans=0.025 2023-11-26 16:27:30,925 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 8.861e+01 9.516e+01 1.005e+02 1.326e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 16:27:32,022 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3900, loss[loss=0.05922, simple_loss=0.08268, pruned_loss=0.008296, audio_tagging_loss=0.009588, over 15466.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09127, pruned_loss=0.01267, audio_tagging_loss=0.008728, over 3051188.23 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:27:33,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3472840.0, ans=0.125 2023-11-26 16:27:34,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3472840.0, ans=0.125 2023-11-26 16:27:35,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=12.0 2023-11-26 16:27:37,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3472840.0, ans=0.0 2023-11-26 16:27:39,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3472840.0, ans=0.025 2023-11-26 16:27:51,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3472906.6666666665, ans=0.125 2023-11-26 16:27:57,179 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520950 2023-11-26 16:28:01,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3472973.3333333335, ans=0.125 2023-11-26 16:28:02,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3472973.3333333335, ans=0.125 2023-11-26 16:28:07,401 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.85 vs. limit=10.0 2023-11-26 16:28:28,139 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3950, loss[loss=0.05335, simple_loss=0.07314, pruned_loss=0.007365, audio_tagging_loss=0.009419, over 14965.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09148, pruned_loss=0.01262, audio_tagging_loss=0.008756, over 3050777.15 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:28:33,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3473173.3333333335, ans=0.125 2023-11-26 16:28:34,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3473173.3333333335, ans=0.2 2023-11-26 16:28:46,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2023-11-26 16:28:51,995 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521000 2023-11-26 16:28:56,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3473306.6666666665, ans=0.1 2023-11-26 16:29:04,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-11-26 16:29:22,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.021e+01 9.497e+01 1.017e+02 1.308e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 16:29:23,984 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4000, loss[loss=0.06166, simple_loss=0.0818, pruned_loss=0.009247, audio_tagging_loss=0.01151, over 15016.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09163, pruned_loss=0.01264, audio_tagging_loss=0.008917, over 3046857.95 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:29:48,270 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521050 2023-11-26 16:30:16,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3473773.3333333335, ans=0.2 2023-11-26 16:30:19,828 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4050, loss[loss=0.0716, simple_loss=0.09458, pruned_loss=0.01175, audio_tagging_loss=0.01256, over 15000.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09186, pruned_loss=0.01276, audio_tagging_loss=0.008993, over 3044840.43 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:30:22,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3473840.0, ans=0.125 2023-11-26 16:30:23,558 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:30:44,964 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521100 2023-11-26 16:30:59,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3474040.0, ans=0.0 2023-11-26 16:31:06,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3474106.6666666665, ans=0.0 2023-11-26 16:31:13,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3474106.6666666665, ans=0.1 2023-11-26 16:31:14,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.820e+01 9.465e+01 1.007e+02 1.196e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 16:31:15,935 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4100, loss[loss=0.05998, simple_loss=0.0832, pruned_loss=0.009406, audio_tagging_loss=0.008974, over 16098.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09143, pruned_loss=0.01262, audio_tagging_loss=0.008943, over 3045604.79 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:31:32,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-26 16:31:36,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3474240.0, ans=0.125 2023-11-26 16:31:38,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-26 16:31:40,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521150 2023-11-26 16:31:42,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3474306.6666666665, ans=0.125 2023-11-26 16:31:46,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3474306.6666666665, ans=0.2 2023-11-26 16:31:51,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3474373.3333333335, ans=0.2 2023-11-26 16:31:55,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3474373.3333333335, ans=0.2 2023-11-26 16:32:03,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3474440.0, ans=0.125 2023-11-26 16:32:11,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3474506.6666666665, ans=0.125 2023-11-26 16:32:12,709 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4150, loss[loss=0.05438, simple_loss=0.07211, pruned_loss=0.008359, audio_tagging_loss=0.009967, over 14923.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09075, pruned_loss=0.0125, audio_tagging_loss=0.008785, over 3040832.36 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:32:12,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3474506.6666666665, ans=0.0 2023-11-26 16:32:13,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3474506.6666666665, ans=10.0 2023-11-26 16:32:19,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3474506.6666666665, ans=0.0 2023-11-26 16:32:28,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2023-11-26 16:32:30,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3474573.3333333335, ans=0.1 2023-11-26 16:32:34,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3474640.0, ans=0.125 2023-11-26 16:32:36,823 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521200 2023-11-26 16:32:54,684 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:32:59,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3474773.3333333335, ans=0.125 2023-11-26 16:33:02,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3474773.3333333335, ans=0.125 2023-11-26 16:33:07,364 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 9.007e+01 9.363e+01 1.013e+02 1.321e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 16:33:08,500 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4200, loss[loss=0.0623, simple_loss=0.0897, pruned_loss=0.009871, audio_tagging_loss=0.007578, over 15052.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09025, pruned_loss=0.01236, audio_tagging_loss=0.008734, over 3032365.79 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:33:22,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3474906.6666666665, ans=0.125 2023-11-26 16:33:27,302 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2023-11-26 16:33:33,739 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521250 2023-11-26 16:34:04,095 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4250, loss[loss=0.06839, simple_loss=0.09143, pruned_loss=0.01306, audio_tagging_loss=0.009609, over 15238.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09029, pruned_loss=0.01234, audio_tagging_loss=0.008614, over 3035340.55 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:34:18,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3475240.0, ans=0.1 2023-11-26 16:34:28,587 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521300 2023-11-26 16:34:30,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3475306.6666666665, ans=0.125 2023-11-26 16:34:33,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=22.5 2023-11-26 16:34:59,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 8.954e+01 9.475e+01 1.016e+02 1.438e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 16:35:00,235 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4300, loss[loss=0.06371, simple_loss=0.08224, pruned_loss=0.01433, audio_tagging_loss=0.008264, over 15261.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09118, pruned_loss=0.01246, audio_tagging_loss=0.008549, over 3038016.32 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:35:02,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3475506.6666666665, ans=0.0 2023-11-26 16:35:23,635 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521350 2023-11-26 16:35:34,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.71 vs. limit=10.0 2023-11-26 16:35:48,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3475773.3333333335, ans=0.125 2023-11-26 16:35:55,107 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4350, loss[loss=0.05169, simple_loss=0.0695, pruned_loss=0.008759, audio_tagging_loss=0.008187, over 13856.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09117, pruned_loss=0.01234, audio_tagging_loss=0.008536, over 3031048.71 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:36:00,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-26 16:36:03,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3475840.0, ans=0.125 2023-11-26 16:36:07,269 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-26 16:36:19,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521400 2023-11-26 16:36:23,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3475973.3333333335, ans=0.0 2023-11-26 16:36:26,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3475973.3333333335, ans=0.0 2023-11-26 16:36:39,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3476106.6666666665, ans=0.05 2023-11-26 16:36:49,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.904e+01 9.369e+01 1.014e+02 1.389e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 16:36:50,193 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4400, loss[loss=0.06178, simple_loss=0.07768, pruned_loss=0.01358, audio_tagging_loss=0.009359, over 15953.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.0907, pruned_loss=0.01225, audio_tagging_loss=0.008517, over 3034349.65 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:37:05,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3476240.0, ans=0.125 2023-11-26 16:37:14,875 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521450 2023-11-26 16:37:20,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3476306.6666666665, ans=0.125 2023-11-26 16:37:33,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3476440.0, ans=0.1 2023-11-26 16:37:46,791 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4450, loss[loss=0.07123, simple_loss=0.09675, pruned_loss=0.01463, audio_tagging_loss=0.008226, over 15560.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09041, pruned_loss=0.01216, audio_tagging_loss=0.008416, over 3040034.71 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:38:10,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521500 2023-11-26 16:38:18,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3476706.6666666665, ans=0.1 2023-11-26 16:38:24,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-26 16:38:40,517 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.912e+01 9.426e+01 1.014e+02 1.545e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 16:38:41,579 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4500, loss[loss=0.04972, simple_loss=0.06193, pruned_loss=0.008761, audio_tagging_loss=0.00999, over 15180.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08958, pruned_loss=0.01213, audio_tagging_loss=0.008463, over 3040290.09 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:38:48,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3476840.0, ans=0.125 2023-11-26 16:38:51,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3476906.6666666665, ans=12.0 2023-11-26 16:38:53,683 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2023-11-26 16:39:05,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521550 2023-11-26 16:39:11,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3476973.3333333335, ans=0.0 2023-11-26 16:39:24,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3477106.6666666665, ans=0.125 2023-11-26 16:39:36,116 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4550, loss[loss=0.06425, simple_loss=0.09289, pruned_loss=0.01029, audio_tagging_loss=0.007515, over 15381.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.0895, pruned_loss=0.01209, audio_tagging_loss=0.008481, over 3042584.85 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:39:40,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2023-11-26 16:39:52,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.62 vs. limit=5.0 2023-11-26 16:40:01,321 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521600 2023-11-26 16:40:20,740 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:40:29,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3477440.0, ans=0.0 2023-11-26 16:40:30,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.935e+01 8.644e+01 9.356e+01 1.024e+02 1.228e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 16:40:31,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3477506.6666666665, ans=0.0 2023-11-26 16:40:31,924 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4600, loss[loss=0.08145, simple_loss=0.1107, pruned_loss=0.01765, audio_tagging_loss=0.008465, over 15464.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08866, pruned_loss=0.01195, audio_tagging_loss=0.008547, over 3044301.15 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:40:37,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3477506.6666666665, ans=0.0 2023-11-26 16:40:57,096 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521650 2023-11-26 16:40:58,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-26 16:41:28,633 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4650, loss[loss=0.0697, simple_loss=0.091, pruned_loss=0.0152, audio_tagging_loss=0.008997, over 14821.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08886, pruned_loss=0.0122, audio_tagging_loss=0.008588, over 3044017.30 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:41:32,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3477840.0, ans=0.2 2023-11-26 16:41:32,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3477840.0, ans=0.125 2023-11-26 16:41:34,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3477840.0, ans=0.125 2023-11-26 16:41:52,965 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521700 2023-11-26 16:41:55,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3477973.3333333335, ans=0.0 2023-11-26 16:41:55,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-26 16:42:05,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478040.0, ans=0.1 2023-11-26 16:42:11,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-26 16:42:23,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.839e+01 9.612e+01 1.022e+02 1.375e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 16:42:23,768 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4700, loss[loss=0.05064, simple_loss=0.06625, pruned_loss=0.007499, audio_tagging_loss=0.01001, over 15871.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08899, pruned_loss=0.0122, audio_tagging_loss=0.008694, over 3048244.41 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:42:31,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3478173.3333333335, ans=0.07 2023-11-26 16:42:35,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3478240.0, ans=0.0 2023-11-26 16:42:35,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3478240.0, ans=0.125 2023-11-26 16:42:48,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3478306.6666666665, ans=0.2 2023-11-26 16:42:49,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521750 2023-11-26 16:42:59,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3478373.3333333335, ans=10.0 2023-11-26 16:43:04,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3478373.3333333335, ans=0.0 2023-11-26 16:43:06,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3478373.3333333335, ans=0.0 2023-11-26 16:43:15,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3478440.0, ans=0.125 2023-11-26 16:43:19,240 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4750, loss[loss=0.0615, simple_loss=0.08467, pruned_loss=0.009698, audio_tagging_loss=0.009469, over 14371.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08879, pruned_loss=0.01208, audio_tagging_loss=0.008768, over 3051433.53 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:43:27,320 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:43:28,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3478506.6666666665, ans=0.1 2023-11-26 16:43:39,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.57 vs. limit=15.0 2023-11-26 16:43:43,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521800 2023-11-26 16:43:54,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3478706.6666666665, ans=0.0 2023-11-26 16:44:05,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3478773.3333333335, ans=0.125 2023-11-26 16:44:09,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3478773.3333333335, ans=0.035 2023-11-26 16:44:09,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3478773.3333333335, ans=0.125 2023-11-26 16:44:15,711 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4800, loss[loss=0.06946, simple_loss=0.09613, pruned_loss=0.01239, audio_tagging_loss=0.008999, over 15779.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08933, pruned_loss=0.0123, audio_tagging_loss=0.008802, over 3050150.24 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:44:16,765 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.936e+01 9.415e+01 1.023e+02 1.286e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 16:44:39,425 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521850 2023-11-26 16:44:41,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3478973.3333333335, ans=0.125 2023-11-26 16:44:49,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3479040.0, ans=0.125 2023-11-26 16:44:51,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3479040.0, ans=0.0 2023-11-26 16:44:56,292 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:45:11,480 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4850, loss[loss=0.08847, simple_loss=0.1304, pruned_loss=0.01502, audio_tagging_loss=0.008243, over 15850.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09059, pruned_loss=0.01251, audio_tagging_loss=0.008885, over 3051430.86 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:45:13,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3479173.3333333335, ans=0.125 2023-11-26 16:45:36,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521900 2023-11-26 16:45:50,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3479373.3333333335, ans=0.125 2023-11-26 16:45:50,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3479373.3333333335, ans=0.125 2023-11-26 16:46:06,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-11-26 16:46:07,675 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4900, loss[loss=0.05838, simple_loss=0.07455, pruned_loss=0.009411, audio_tagging_loss=0.0117, over 15101.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08963, pruned_loss=0.01238, audio_tagging_loss=0.008897, over 3050524.65 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:46:08,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.681e+01 9.501e+01 1.005e+02 1.327e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 16:46:18,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3479573.3333333335, ans=0.07 2023-11-26 16:46:32,677 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521950 2023-11-26 16:46:33,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3479640.0, ans=0.125 2023-11-26 16:46:50,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3479706.6666666665, ans=0.09899494936611666 2023-11-26 16:47:03,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3479840.0, ans=0.125 2023-11-26 16:47:03,985 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4950, loss[loss=0.05933, simple_loss=0.07874, pruned_loss=0.009665, audio_tagging_loss=0.0103, over 14807.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08865, pruned_loss=0.01218, audio_tagging_loss=0.008755, over 3045811.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:47:16,454 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:47:27,913 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522000 2023-11-26 16:47:42,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3480040.0, ans=0.125 2023-11-26 16:48:00,037 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5000, loss[loss=0.06863, simple_loss=0.09574, pruned_loss=0.01434, audio_tagging_loss=0.006419, over 16119.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0894, pruned_loss=0.01234, audio_tagging_loss=0.008705, over 3037979.33 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:48:01,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.862e+01 9.666e+01 1.035e+02 1.226e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 16:48:08,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3480173.3333333335, ans=0.125 2023-11-26 16:48:25,373 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522050 2023-11-26 16:48:34,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3480373.3333333335, ans=0.05 2023-11-26 16:48:54,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3480506.6666666665, ans=0.125 2023-11-26 16:48:55,730 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5050, loss[loss=0.04674, simple_loss=0.05619, pruned_loss=0.007205, audio_tagging_loss=0.01144, over 14810.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08865, pruned_loss=0.01213, audio_tagging_loss=0.008708, over 3038035.34 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:49:01,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3480506.6666666665, ans=0.125 2023-11-26 16:49:05,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-26 16:49:08,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3480573.3333333335, ans=0.0 2023-11-26 16:49:08,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3480573.3333333335, ans=0.125 2023-11-26 16:49:09,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3480573.3333333335, ans=0.0 2023-11-26 16:49:20,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3480640.0, ans=0.125 2023-11-26 16:49:21,054 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522100 2023-11-26 16:49:38,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2023-11-26 16:49:53,097 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5100, loss[loss=0.06116, simple_loss=0.08556, pruned_loss=0.0102, audio_tagging_loss=0.008185, over 14746.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08851, pruned_loss=0.01216, audio_tagging_loss=0.008689, over 3039566.21 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:49:54,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.678e+01 9.277e+01 1.001e+02 1.240e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 16:49:54,628 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2023-11-26 16:50:09,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-11-26 16:50:17,108 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522150 2023-11-26 16:50:26,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3481040.0, ans=0.125 2023-11-26 16:50:28,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3481040.0, ans=0.125 2023-11-26 16:50:39,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3481106.6666666665, ans=0.125 2023-11-26 16:50:44,772 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2023-11-26 16:50:46,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3481106.6666666665, ans=0.2 2023-11-26 16:50:48,535 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5150, loss[loss=0.08119, simple_loss=0.1146, pruned_loss=0.01452, audio_tagging_loss=0.009388, over 15321.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0882, pruned_loss=0.0121, audio_tagging_loss=0.008673, over 3039906.07 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:51:14,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522200 2023-11-26 16:51:17,005 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-11-26 16:51:26,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3481373.3333333335, ans=0.1 2023-11-26 16:51:28,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3481373.3333333335, ans=0.1 2023-11-26 16:51:44,719 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5200, loss[loss=0.06555, simple_loss=0.08704, pruned_loss=0.01293, audio_tagging_loss=0.009102, over 14604.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08877, pruned_loss=0.01221, audio_tagging_loss=0.008723, over 3044535.73 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:51:45,725 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.774e+01 9.486e+01 1.034e+02 1.875e+02, threshold=1.897e+02, percent-clipped=1.0 2023-11-26 16:51:50,972 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-26 16:52:07,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-26 16:52:09,805 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522250 2023-11-26 16:52:20,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3481706.6666666665, ans=0.0 2023-11-26 16:52:34,855 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-26 16:52:38,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3481773.3333333335, ans=0.125 2023-11-26 16:52:41,987 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5250, loss[loss=0.06298, simple_loss=0.08329, pruned_loss=0.01339, audio_tagging_loss=0.007951, over 17417.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08962, pruned_loss=0.01231, audio_tagging_loss=0.008562, over 3051082.28 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:52:53,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3481906.6666666665, ans=0.125 2023-11-26 16:52:56,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3481906.6666666665, ans=0.125 2023-11-26 16:53:02,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3481973.3333333335, ans=0.125 2023-11-26 16:53:05,995 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522300 2023-11-26 16:53:37,444 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5300, loss[loss=0.06965, simple_loss=0.1028, pruned_loss=0.01056, audio_tagging_loss=0.0077, over 15192.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08979, pruned_loss=0.01226, audio_tagging_loss=0.008567, over 3051175.99 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:53:39,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.820e+01 9.463e+01 1.024e+02 1.274e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 16:54:02,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522350 2023-11-26 16:54:23,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3482440.0, ans=0.1 2023-11-26 16:54:28,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3482440.0, ans=0.125 2023-11-26 16:54:33,243 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5350, loss[loss=0.07119, simple_loss=0.09847, pruned_loss=0.0133, audio_tagging_loss=0.008659, over 15800.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.0902, pruned_loss=0.0123, audio_tagging_loss=0.008647, over 3046997.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:54:35,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-26 16:54:53,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3482573.3333333335, ans=0.2 2023-11-26 16:54:58,401 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522400 2023-11-26 16:55:01,314 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2023-11-26 16:55:01,999 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:55:26,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3482773.3333333335, ans=0.125 2023-11-26 16:55:30,687 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5400, loss[loss=0.08272, simple_loss=0.114, pruned_loss=0.01804, audio_tagging_loss=0.007677, over 15069.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0905, pruned_loss=0.01247, audio_tagging_loss=0.008669, over 3040183.79 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:55:32,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.955e+01 9.129e+01 9.512e+01 1.019e+02 1.244e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 16:55:52,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3482973.3333333335, ans=0.2 2023-11-26 16:55:54,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522450 2023-11-26 16:55:55,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3482973.3333333335, ans=0.0 2023-11-26 16:56:13,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3483040.0, ans=0.125 2023-11-26 16:56:26,050 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5450, loss[loss=0.06268, simple_loss=0.07944, pruned_loss=0.01094, audio_tagging_loss=0.01202, over 15166.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09067, pruned_loss=0.01256, audio_tagging_loss=0.008671, over 3041338.98 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:56:46,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-26 16:56:50,419 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522500 2023-11-26 16:56:52,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3483306.6666666665, ans=0.125 2023-11-26 16:57:06,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-26 16:57:11,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3483440.0, ans=0.125 2023-11-26 16:57:21,256 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5500, loss[loss=0.06797, simple_loss=0.0779, pruned_loss=0.01804, audio_tagging_loss=0.01097, over 15675.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09002, pruned_loss=0.01236, audio_tagging_loss=0.008808, over 3038040.53 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:57:23,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.753e+01 9.597e+01 1.033e+02 1.583e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 16:57:46,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.98 vs. limit=10.0 2023-11-26 16:57:46,831 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522550 2023-11-26 16:57:55,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3483706.6666666665, ans=0.125 2023-11-26 16:57:55,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-26 16:58:04,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3483706.6666666665, ans=0.0 2023-11-26 16:58:05,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3483773.3333333335, ans=0.0 2023-11-26 16:58:18,144 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5550, loss[loss=0.06604, simple_loss=0.08268, pruned_loss=0.01381, audio_tagging_loss=0.01089, over 13670.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08863, pruned_loss=0.01217, audio_tagging_loss=0.008966, over 3034195.72 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:58:21,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3483840.0, ans=0.0 2023-11-26 16:58:24,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3483840.0, ans=0.125 2023-11-26 16:58:26,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3483840.0, ans=0.125 2023-11-26 16:58:41,944 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522600 2023-11-26 16:59:06,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3484106.6666666665, ans=0.2 2023-11-26 16:59:12,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3484106.6666666665, ans=0.2 2023-11-26 16:59:13,966 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5600, loss[loss=0.08654, simple_loss=0.1287, pruned_loss=0.01662, audio_tagging_loss=0.005551, over 15346.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0892, pruned_loss=0.01217, audio_tagging_loss=0.008958, over 3042128.07 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:59:16,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.836e+01 9.428e+01 1.004e+02 1.214e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 16:59:23,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3484240.0, ans=0.125 2023-11-26 16:59:23,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3484240.0, ans=0.2 2023-11-26 16:59:38,565 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522650 2023-11-26 16:59:38,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3484306.6666666665, ans=0.125 2023-11-26 16:59:56,684 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:59:59,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2023-11-26 17:00:04,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=3484440.0, ans=12.0 2023-11-26 17:00:09,438 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5650, loss[loss=0.06221, simple_loss=0.07971, pruned_loss=0.01147, audio_tagging_loss=0.01088, over 15590.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08959, pruned_loss=0.0122, audio_tagging_loss=0.008996, over 3049775.94 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:00:34,604 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522700 2023-11-26 17:00:52,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3484773.3333333335, ans=0.0 2023-11-26 17:01:05,348 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5700, loss[loss=0.07311, simple_loss=0.1069, pruned_loss=0.01262, audio_tagging_loss=0.007061, over 15374.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09064, pruned_loss=0.01246, audio_tagging_loss=0.008975, over 3049755.74 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:01:07,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3484840.0, ans=0.125 2023-11-26 17:01:08,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.017e+01 9.014e+01 9.489e+01 1.005e+02 1.284e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 17:01:19,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2023-11-26 17:01:20,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3484906.6666666665, ans=0.0 2023-11-26 17:01:20,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3484906.6666666665, ans=0.07 2023-11-26 17:01:29,926 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522750 2023-11-26 17:01:53,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2023-11-26 17:01:57,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3485106.6666666665, ans=0.5 2023-11-26 17:02:01,803 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5750, loss[loss=0.06332, simple_loss=0.08244, pruned_loss=0.01291, audio_tagging_loss=0.009192, over 14777.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08978, pruned_loss=0.01233, audio_tagging_loss=0.008812, over 3054129.60 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:02:20,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3485240.0, ans=0.125 2023-11-26 17:02:21,235 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:02:22,774 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=22.5 2023-11-26 17:02:25,400 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522800 2023-11-26 17:02:25,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3485306.6666666665, ans=0.125 2023-11-26 17:02:29,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2023-11-26 17:02:34,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3485373.3333333335, ans=0.5 2023-11-26 17:02:57,293 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5800, loss[loss=0.07709, simple_loss=0.09755, pruned_loss=0.01666, audio_tagging_loss=0.01166, over 15585.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08913, pruned_loss=0.01215, audio_tagging_loss=0.008759, over 3056206.28 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:02:59,435 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.825e+01 9.413e+01 1.036e+02 1.628e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 17:03:06,242 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-11-26 17:03:23,258 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522850 2023-11-26 17:03:38,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3485706.6666666665, ans=0.125 2023-11-26 17:03:44,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2023-11-26 17:03:50,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3485773.3333333335, ans=0.0 2023-11-26 17:03:53,386 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5850, loss[loss=0.05241, simple_loss=0.07014, pruned_loss=0.008606, audio_tagging_loss=0.008733, over 15143.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09016, pruned_loss=0.01241, audio_tagging_loss=0.008729, over 3052284.70 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:04:09,659 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-11-26 17:04:18,768 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522900 2023-11-26 17:04:41,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2023-11-26 17:04:50,445 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5900, loss[loss=0.07053, simple_loss=0.09393, pruned_loss=0.01302, audio_tagging_loss=0.01054, over 13764.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09016, pruned_loss=0.01226, audio_tagging_loss=0.008712, over 3049392.04 frames. ], batch size: 52, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:04:52,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.809e+01 9.343e+01 1.010e+02 1.341e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 17:04:56,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3486173.3333333335, ans=0.125 2023-11-26 17:05:01,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3486240.0, ans=0.125 2023-11-26 17:05:04,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3486240.0, ans=0.125 2023-11-26 17:05:13,781 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522950 2023-11-26 17:05:15,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3486306.6666666665, ans=0.125 2023-11-26 17:05:35,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3486440.0, ans=0.95 2023-11-26 17:05:38,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3486440.0, ans=0.125 2023-11-26 17:05:44,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2023-11-26 17:05:45,289 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5950, loss[loss=0.06134, simple_loss=0.0918, pruned_loss=0.008108, audio_tagging_loss=0.007332, over 14967.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09067, pruned_loss=0.01237, audio_tagging_loss=0.008651, over 3048284.91 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:06:09,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3486640.0, ans=0.0 2023-11-26 17:06:09,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2023-11-26 17:06:10,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523000 2023-11-26 17:06:10,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=3486640.0, ans=0.02 2023-11-26 17:06:20,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3486706.6666666665, ans=0.0 2023-11-26 17:06:24,950 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:06:40,780 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6000, loss[loss=0.07299, simple_loss=0.09331, pruned_loss=0.01593, audio_tagging_loss=0.01041, over 14477.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08888, pruned_loss=0.0122, audio_tagging_loss=0.008686, over 3039018.72 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:06:40,781 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 17:07:13,760 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05792, simple_loss=0.05061, pruned_loss=0.005328, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-26 17:07:13,761 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 17:07:16,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.949e+01 9.418e+01 1.019e+02 1.469e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 17:07:20,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3486840.0, ans=0.1 2023-11-26 17:07:37,162 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523050 2023-11-26 17:07:42,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2023-11-26 17:07:56,220 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:07:56,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3487040.0, ans=0.125 2023-11-26 17:07:59,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487106.6666666665, ans=0.1 2023-11-26 17:08:04,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3487106.6666666665, ans=0.125 2023-11-26 17:08:08,784 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6050, loss[loss=0.06585, simple_loss=0.09447, pruned_loss=0.01395, audio_tagging_loss=0.004665, over 15307.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08836, pruned_loss=0.01214, audio_tagging_loss=0.008711, over 3028663.09 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:08:13,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3487173.3333333335, ans=0.0 2023-11-26 17:08:19,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2023-11-26 17:08:22,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3487240.0, ans=10.0 2023-11-26 17:08:30,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3487306.6666666665, ans=0.125 2023-11-26 17:08:32,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2023-11-26 17:08:33,705 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523100 2023-11-26 17:08:38,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3487306.6666666665, ans=0.125 2023-11-26 17:08:52,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3487440.0, ans=0.07 2023-11-26 17:08:59,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3487440.0, ans=0.125 2023-11-26 17:09:04,398 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6100, loss[loss=0.05542, simple_loss=0.07241, pruned_loss=0.009426, audio_tagging_loss=0.009783, over 15820.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08893, pruned_loss=0.01195, audio_tagging_loss=0.008696, over 3036779.61 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:09:08,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.667e+01 9.163e+01 9.920e+01 1.251e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 17:09:11,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3487506.6666666665, ans=0.125 2023-11-26 17:09:17,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3487573.3333333335, ans=0.125 2023-11-26 17:09:29,465 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523150 2023-11-26 17:09:31,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3487640.0, ans=0.1 2023-11-26 17:09:32,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3487640.0, ans=0.2 2023-11-26 17:09:37,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3487706.6666666665, ans=0.125 2023-11-26 17:10:00,748 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6150, loss[loss=0.06829, simple_loss=0.09186, pruned_loss=0.01005, audio_tagging_loss=0.0123, over 14655.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08885, pruned_loss=0.01209, audio_tagging_loss=0.008737, over 3039402.40 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:10:00,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3487840.0, ans=0.125 2023-11-26 17:10:00,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3487840.0, ans=0.2 2023-11-26 17:10:02,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487840.0, ans=0.1 2023-11-26 17:10:19,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3487906.6666666665, ans=0.125 2023-11-26 17:10:22,170 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-26 17:10:24,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523200 2023-11-26 17:10:24,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3487973.3333333335, ans=0.0 2023-11-26 17:10:38,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3488040.0, ans=0.0 2023-11-26 17:10:56,656 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6200, loss[loss=0.07996, simple_loss=0.1094, pruned_loss=0.01636, audio_tagging_loss=0.008882, over 14500.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08944, pruned_loss=0.0123, audio_tagging_loss=0.008762, over 3045278.03 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:10:58,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3488173.3333333335, ans=22.5 2023-11-26 17:10:59,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.701e+01 9.346e+01 1.022e+02 1.320e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 17:11:04,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3488173.3333333335, ans=0.125 2023-11-26 17:11:04,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2023-11-26 17:11:22,147 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523250 2023-11-26 17:11:37,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3488373.3333333335, ans=0.125 2023-11-26 17:11:41,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3488440.0, ans=0.2 2023-11-26 17:11:49,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3488440.0, ans=0.125 2023-11-26 17:11:52,825 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6250, loss[loss=0.07247, simple_loss=0.1038, pruned_loss=0.01172, audio_tagging_loss=0.008845, over 15274.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08896, pruned_loss=0.01206, audio_tagging_loss=0.008763, over 3048471.75 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:11:53,027 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:11:54,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3488506.6666666665, ans=0.125 2023-11-26 17:12:04,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3488573.3333333335, ans=0.125 2023-11-26 17:12:06,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3488573.3333333335, ans=0.125 2023-11-26 17:12:10,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3488573.3333333335, ans=0.1 2023-11-26 17:12:17,948 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523300 2023-11-26 17:12:31,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2023-11-26 17:12:31,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3488706.6666666665, ans=0.125 2023-11-26 17:12:49,228 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6300, loss[loss=0.06761, simple_loss=0.09945, pruned_loss=0.0103, audio_tagging_loss=0.007589, over 15110.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08899, pruned_loss=0.01216, audio_tagging_loss=0.00889, over 3047156.67 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:12:52,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.680e+01 9.348e+01 1.004e+02 1.184e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 17:12:53,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3488840.0, ans=0.1 2023-11-26 17:12:58,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-26 17:13:13,201 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523350 2023-11-26 17:13:15,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3488973.3333333335, ans=0.125 2023-11-26 17:13:27,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-26 17:13:30,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3489040.0, ans=0.0 2023-11-26 17:13:38,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2023-11-26 17:13:42,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3489106.6666666665, ans=0.5 2023-11-26 17:13:44,488 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6350, loss[loss=0.07298, simple_loss=0.1007, pruned_loss=0.01318, audio_tagging_loss=0.009421, over 15111.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0889, pruned_loss=0.01207, audio_tagging_loss=0.008973, over 3038920.85 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:13:56,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3489240.0, ans=0.2 2023-11-26 17:14:01,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3489240.0, ans=0.025 2023-11-26 17:14:09,538 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523400 2023-11-26 17:14:28,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3489440.0, ans=0.0 2023-11-26 17:14:32,546 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-26 17:14:40,439 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6400, loss[loss=0.0675, simple_loss=0.08283, pruned_loss=0.01605, audio_tagging_loss=0.01003, over 14468.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08997, pruned_loss=0.01235, audio_tagging_loss=0.008998, over 3039555.22 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:14:41,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3489506.6666666665, ans=0.0 2023-11-26 17:14:42,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=22.5 2023-11-26 17:14:45,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.723e+01 9.338e+01 1.021e+02 1.186e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 17:14:59,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3489573.3333333335, ans=0.0 2023-11-26 17:15:05,625 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523450 2023-11-26 17:15:11,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3489640.0, ans=0.125 2023-11-26 17:15:11,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3489640.0, ans=0.0 2023-11-26 17:15:23,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3489706.6666666665, ans=0.2 2023-11-26 17:15:28,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489773.3333333335, ans=0.1 2023-11-26 17:15:28,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3489773.3333333335, ans=0.125 2023-11-26 17:15:37,400 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6450, loss[loss=0.0622, simple_loss=0.08401, pruned_loss=0.01141, audio_tagging_loss=0.008782, over 14802.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08943, pruned_loss=0.01219, audio_tagging_loss=0.00905, over 3042032.29 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:15:48,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3489906.6666666665, ans=0.0 2023-11-26 17:15:51,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3489906.6666666665, ans=0.125 2023-11-26 17:16:01,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523500 2023-11-26 17:16:06,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3489973.3333333335, ans=0.1 2023-11-26 17:16:10,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3490040.0, ans=0.0 2023-11-26 17:16:21,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=22.5 2023-11-26 17:16:25,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3490106.6666666665, ans=0.125 2023-11-26 17:16:25,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=22.5 2023-11-26 17:16:29,235 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2023-11-26 17:16:32,619 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6500, loss[loss=0.05045, simple_loss=0.07327, pruned_loss=0.004713, audio_tagging_loss=0.009102, over 16524.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08906, pruned_loss=0.01217, audio_tagging_loss=0.009139, over 3043920.60 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:16:36,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3490173.3333333335, ans=0.2 2023-11-26 17:16:37,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.749e+01 9.498e+01 1.005e+02 1.590e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 17:16:44,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3490240.0, ans=0.125 2023-11-26 17:16:45,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3490240.0, ans=15.0 2023-11-26 17:16:57,459 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523550 2023-11-26 17:17:00,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2023-11-26 17:17:28,337 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6550, loss[loss=0.07016, simple_loss=0.09676, pruned_loss=0.0118, audio_tagging_loss=0.009973, over 16180.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08961, pruned_loss=0.0122, audio_tagging_loss=0.008954, over 3051037.40 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:17:32,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-26 17:17:53,225 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523600 2023-11-26 17:18:24,779 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6600, loss[loss=0.05776, simple_loss=0.07529, pruned_loss=0.01131, audio_tagging_loss=0.008808, over 14673.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08959, pruned_loss=0.01222, audio_tagging_loss=0.008724, over 3046509.09 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:18:30,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.733e+01 9.478e+01 1.039e+02 1.396e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 17:18:40,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3490906.6666666665, ans=0.0 2023-11-26 17:18:45,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3490973.3333333335, ans=0.2 2023-11-26 17:18:48,746 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523650 2023-11-26 17:18:58,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2023-11-26 17:19:07,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2023-11-26 17:19:20,544 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6650, loss[loss=0.06277, simple_loss=0.08664, pruned_loss=0.01348, audio_tagging_loss=0.005965, over 14771.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08861, pruned_loss=0.01201, audio_tagging_loss=0.008679, over 3052260.56 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:19:45,631 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523700 2023-11-26 17:19:47,184 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2023-11-26 17:19:49,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-26 17:20:06,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3491440.0, ans=0.2 2023-11-26 17:20:09,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3491440.0, ans=0.2 2023-11-26 17:20:15,803 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6700, loss[loss=0.07113, simple_loss=0.1073, pruned_loss=0.009995, audio_tagging_loss=0.007486, over 15107.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08761, pruned_loss=0.01181, audio_tagging_loss=0.00866, over 3053076.15 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:20:21,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.660e+01 9.419e+01 1.025e+02 1.437e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 17:20:41,453 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523750 2023-11-26 17:21:08,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-26 17:21:12,562 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6750, loss[loss=0.04263, simple_loss=0.05327, pruned_loss=0.007612, audio_tagging_loss=0.008381, over 14091.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08771, pruned_loss=0.0118, audio_tagging_loss=0.00865, over 3045139.03 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:21:15,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-26 17:21:28,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3491906.6666666665, ans=0.0 2023-11-26 17:21:30,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3491906.6666666665, ans=0.125 2023-11-26 17:21:36,689 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523800 2023-11-26 17:21:49,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3492040.0, ans=0.125 2023-11-26 17:21:56,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=22.5 2023-11-26 17:22:08,689 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6800, loss[loss=0.07923, simple_loss=0.1158, pruned_loss=0.01579, audio_tagging_loss=0.005551, over 15686.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08843, pruned_loss=0.01192, audio_tagging_loss=0.008712, over 3047976.01 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:22:13,932 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 8.963e+01 9.408e+01 1.006e+02 1.345e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 17:22:29,311 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2023-11-26 17:22:33,152 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523850 2023-11-26 17:22:39,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3492306.6666666665, ans=0.125 2023-11-26 17:22:51,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3492373.3333333335, ans=0.125 2023-11-26 17:22:53,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3492440.0, ans=0.125 2023-11-26 17:22:56,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3492440.0, ans=0.125 2023-11-26 17:23:03,564 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6850, loss[loss=0.03853, simple_loss=0.04466, pruned_loss=0.004855, audio_tagging_loss=0.01135, over 14942.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.0878, pruned_loss=0.01184, audio_tagging_loss=0.008617, over 3044022.42 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:23:06,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3492506.6666666665, ans=0.125 2023-11-26 17:23:10,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3492506.6666666665, ans=0.125 2023-11-26 17:23:24,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3492573.3333333335, ans=0.0 2023-11-26 17:23:25,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3492640.0, ans=0.125 2023-11-26 17:23:27,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3492640.0, ans=0.125 2023-11-26 17:23:29,038 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523900 2023-11-26 17:23:45,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3492706.6666666665, ans=0.0 2023-11-26 17:23:51,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3492773.3333333335, ans=0.125 2023-11-26 17:23:53,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-26 17:23:53,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=22.5 2023-11-26 17:23:58,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3492840.0, ans=0.1 2023-11-26 17:23:59,742 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6900, loss[loss=0.04141, simple_loss=0.04768, pruned_loss=0.00603, audio_tagging_loss=0.01154, over 15905.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.0884, pruned_loss=0.01186, audio_tagging_loss=0.008671, over 3049403.21 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:24:07,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.659e+01 9.332e+01 1.017e+02 1.232e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 17:24:13,900 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=12.0 2023-11-26 17:24:24,253 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523950 2023-11-26 17:24:36,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3493040.0, ans=0.05 2023-11-26 17:24:38,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3493040.0, ans=0.125 2023-11-26 17:24:39,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2023-11-26 17:24:44,350 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:24:49,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3493106.6666666665, ans=0.125 2023-11-26 17:24:51,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3493106.6666666665, ans=0.0 2023-11-26 17:24:56,064 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6950, loss[loss=0.04625, simple_loss=0.06294, pruned_loss=0.007529, audio_tagging_loss=0.007246, over 14821.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.0905, pruned_loss=0.01225, audio_tagging_loss=0.008468, over 3051279.35 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:24:59,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3493173.3333333335, ans=0.0 2023-11-26 17:25:06,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3493240.0, ans=0.1 2023-11-26 17:25:19,481 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524000 2023-11-26 17:25:39,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3493373.3333333335, ans=0.0 2023-11-26 17:25:44,411 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:25:46,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3493440.0, ans=0.0 2023-11-26 17:25:47,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=15.0 2023-11-26 17:25:53,706 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7000, loss[loss=0.06471, simple_loss=0.09145, pruned_loss=0.01297, audio_tagging_loss=0.006016, over 14982.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0908, pruned_loss=0.01244, audio_tagging_loss=0.008488, over 3046127.94 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:26:00,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.888e+01 9.554e+01 1.025e+02 1.624e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 17:26:00,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3493506.6666666665, ans=0.125 2023-11-26 17:26:19,191 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524050 2023-11-26 17:26:33,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3493706.6666666665, ans=0.0 2023-11-26 17:26:40,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2023-11-26 17:26:49,083 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7050, loss[loss=0.05451, simple_loss=0.05952, pruned_loss=0.01473, audio_tagging_loss=0.01002, over 14639.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08933, pruned_loss=0.01226, audio_tagging_loss=0.008651, over 3049027.07 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:26:58,470 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.33 vs. limit=10.0 2023-11-26 17:27:05,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=22.5 2023-11-26 17:27:14,140 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524100 2023-11-26 17:27:19,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3493973.3333333335, ans=0.1 2023-11-26 17:27:20,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3493973.3333333335, ans=0.0 2023-11-26 17:27:22,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3494040.0, ans=0.125 2023-11-26 17:27:42,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3494106.6666666665, ans=0.2 2023-11-26 17:27:46,091 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7100, loss[loss=0.07164, simple_loss=0.09444, pruned_loss=0.01395, audio_tagging_loss=0.01047, over 15530.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08992, pruned_loss=0.01236, audio_tagging_loss=0.00874, over 3055234.36 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:27:48,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2023-11-26 17:27:52,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.801e+01 9.665e+01 1.022e+02 1.314e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 17:28:07,666 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=22.5 2023-11-26 17:28:09,386 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524150 2023-11-26 17:28:16,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3494306.6666666665, ans=0.125 2023-11-26 17:28:16,971 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:28:22,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3494373.3333333335, ans=0.125 2023-11-26 17:28:36,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3494440.0, ans=10.0 2023-11-26 17:28:40,569 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7150, loss[loss=0.07157, simple_loss=0.09763, pruned_loss=0.01508, audio_tagging_loss=0.007675, over 15682.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09008, pruned_loss=0.01241, audio_tagging_loss=0.00881, over 3052696.51 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:28:44,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3494506.6666666665, ans=0.125 2023-11-26 17:28:49,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3494506.6666666665, ans=0.0 2023-11-26 17:28:53,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3494573.3333333335, ans=0.125 2023-11-26 17:29:05,535 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524200 2023-11-26 17:29:28,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-26 17:29:28,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3494773.3333333335, ans=0.0 2023-11-26 17:29:36,126 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7200, loss[loss=0.06276, simple_loss=0.0817, pruned_loss=0.01051, audio_tagging_loss=0.0114, over 15548.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08951, pruned_loss=0.01224, audio_tagging_loss=0.008908, over 3052636.79 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:29:40,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3494840.0, ans=0.125 2023-11-26 17:29:43,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.953e+01 9.579e+01 1.038e+02 1.531e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 17:29:45,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3494840.0, ans=0.1 2023-11-26 17:29:59,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3494973.3333333335, ans=0.0 2023-11-26 17:30:01,785 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524250 2023-11-26 17:30:04,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3494973.3333333335, ans=0.125 2023-11-26 17:30:31,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-26 17:30:32,729 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7250, loss[loss=0.05926, simple_loss=0.07618, pruned_loss=0.01205, audio_tagging_loss=0.009122, over 15727.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08932, pruned_loss=0.01212, audio_tagging_loss=0.009065, over 3045776.49 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:30:32,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3495173.3333333335, ans=0.125 2023-11-26 17:30:45,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3495240.0, ans=0.04949747468305833 2023-11-26 17:30:50,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3495240.0, ans=0.2 2023-11-26 17:30:54,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3495306.6666666665, ans=0.125 2023-11-26 17:30:56,600 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524300 2023-11-26 17:31:02,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3495306.6666666665, ans=0.125 2023-11-26 17:31:28,171 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7300, loss[loss=0.06334, simple_loss=0.09747, pruned_loss=0.006227, audio_tagging_loss=0.008381, over 15895.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.0891, pruned_loss=0.01217, audio_tagging_loss=0.00897, over 3037522.36 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:31:29,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3495506.6666666665, ans=0.2 2023-11-26 17:31:35,612 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.863e+01 9.398e+01 1.006e+02 1.192e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 17:31:52,479 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524350 2023-11-26 17:32:06,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3495706.6666666665, ans=0.1 2023-11-26 17:32:19,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.20 vs. limit=5.0 2023-11-26 17:32:23,231 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7350, loss[loss=0.04604, simple_loss=0.06489, pruned_loss=0.005968, audio_tagging_loss=0.007625, over 15720.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08946, pruned_loss=0.01221, audio_tagging_loss=0.008843, over 3036215.01 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:32:32,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3495840.0, ans=0.0 2023-11-26 17:32:45,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3495973.3333333335, ans=0.0 2023-11-26 17:32:46,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3495973.3333333335, ans=0.0 2023-11-26 17:32:48,578 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524400 2023-11-26 17:33:20,277 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7400, loss[loss=0.05522, simple_loss=0.06476, pruned_loss=0.009132, audio_tagging_loss=0.01371, over 15442.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08895, pruned_loss=0.0122, audio_tagging_loss=0.008745, over 3034084.40 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:33:23,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3496173.3333333335, ans=0.125 2023-11-26 17:33:27,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 8.991e+01 9.600e+01 1.026e+02 1.969e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 17:33:35,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2023-11-26 17:33:44,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524450 2023-11-26 17:33:45,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3496306.6666666665, ans=0.125 2023-11-26 17:33:49,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-11-26 17:33:50,667 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:33:52,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-11-26 17:33:55,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.16 vs. limit=15.0 2023-11-26 17:34:05,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3496440.0, ans=0.0 2023-11-26 17:34:07,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3496440.0, ans=0.0 2023-11-26 17:34:10,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2023-11-26 17:34:15,322 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7450, loss[loss=0.06892, simple_loss=0.08807, pruned_loss=0.01289, audio_tagging_loss=0.01199, over 14639.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08903, pruned_loss=0.01221, audio_tagging_loss=0.008667, over 3036475.92 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:34:40,357 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524500 2023-11-26 17:34:49,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2023-11-26 17:35:03,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=22.5 2023-11-26 17:35:11,143 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7500, loss[loss=0.06369, simple_loss=0.09128, pruned_loss=0.0116, audio_tagging_loss=0.006449, over 14229.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08875, pruned_loss=0.01208, audio_tagging_loss=0.00867, over 3032006.94 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:35:13,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3496840.0, ans=15.0 2023-11-26 17:35:19,591 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.894e+01 9.302e+01 1.027e+02 1.348e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-26 17:35:19,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3496840.0, ans=0.0 2023-11-26 17:35:22,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3496906.6666666665, ans=0.1 2023-11-26 17:35:36,263 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524550 2023-11-26 17:35:40,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3496973.3333333335, ans=0.125 2023-11-26 17:35:41,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3496973.3333333335, ans=0.125 2023-11-26 17:36:04,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3497106.6666666665, ans=0.125 2023-11-26 17:36:06,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3497173.3333333335, ans=0.0 2023-11-26 17:36:07,361 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7550, loss[loss=0.07233, simple_loss=0.09135, pruned_loss=0.01717, audio_tagging_loss=0.009479, over 15614.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08945, pruned_loss=0.01225, audio_tagging_loss=0.008692, over 3041784.08 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:36:15,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3497173.3333333335, ans=0.125 2023-11-26 17:36:24,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3497240.0, ans=0.0 2023-11-26 17:36:32,054 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524600 2023-11-26 17:36:39,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3497373.3333333335, ans=10.0 2023-11-26 17:36:41,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-11-26 17:36:48,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=12.0 2023-11-26 17:36:50,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3497373.3333333335, ans=0.125 2023-11-26 17:36:56,456 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2023-11-26 17:37:00,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3497440.0, ans=0.125 2023-11-26 17:37:01,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3497440.0, ans=0.09899494936611666 2023-11-26 17:37:03,387 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7600, loss[loss=0.04489, simple_loss=0.05617, pruned_loss=0.005961, audio_tagging_loss=0.01084, over 16556.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08864, pruned_loss=0.01208, audio_tagging_loss=0.008719, over 3047704.62 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:37:05,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3497506.6666666665, ans=0.125 2023-11-26 17:37:10,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.779e+01 9.690e+01 1.052e+02 1.195e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 17:37:20,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3497573.3333333335, ans=0.0 2023-11-26 17:37:21,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-11-26 17:37:27,965 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524650 2023-11-26 17:37:35,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2023-11-26 17:37:57,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3497773.3333333335, ans=0.125 2023-11-26 17:37:58,859 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7650, loss[loss=0.04939, simple_loss=0.0688, pruned_loss=0.006125, audio_tagging_loss=0.008869, over 14849.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08893, pruned_loss=0.01217, audio_tagging_loss=0.00869, over 3051683.71 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:38:23,438 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524700 2023-11-26 17:38:23,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3497973.3333333335, ans=0.125 2023-11-26 17:38:33,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3498040.0, ans=0.0 2023-11-26 17:38:41,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-11-26 17:38:50,416 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2023-11-26 17:38:52,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-26 17:38:54,585 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7700, loss[loss=0.04891, simple_loss=0.0688, pruned_loss=0.005798, audio_tagging_loss=0.008715, over 16313.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08936, pruned_loss=0.01229, audio_tagging_loss=0.00871, over 3052602.62 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:38:58,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3498173.3333333335, ans=0.0 2023-11-26 17:39:02,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.021e+01 9.589e+01 1.019e+02 1.364e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 17:39:18,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524750 2023-11-26 17:39:33,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3498373.3333333335, ans=0.125 2023-11-26 17:39:39,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3498440.0, ans=0.125 2023-11-26 17:39:41,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3498440.0, ans=0.125 2023-11-26 17:39:43,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3498440.0, ans=0.1 2023-11-26 17:39:50,765 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7750, loss[loss=0.0854, simple_loss=0.1213, pruned_loss=0.01746, audio_tagging_loss=0.007276, over 15817.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09011, pruned_loss=0.01244, audio_tagging_loss=0.008665, over 3056982.42 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:39:51,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-26 17:39:59,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3498506.6666666665, ans=0.125 2023-11-26 17:40:06,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498573.3333333335, ans=0.1 2023-11-26 17:40:11,714 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-26 17:40:12,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3498640.0, ans=0.125 2023-11-26 17:40:15,088 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524800 2023-11-26 17:40:19,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498640.0, ans=0.1 2023-11-26 17:40:19,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3498640.0, ans=0.125 2023-11-26 17:40:36,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3498773.3333333335, ans=0.125 2023-11-26 17:40:38,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498773.3333333335, ans=0.1 2023-11-26 17:40:45,532 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7800, loss[loss=0.0619, simple_loss=0.08515, pruned_loss=0.0109, audio_tagging_loss=0.008428, over 15982.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08878, pruned_loss=0.01207, audio_tagging_loss=0.008801, over 3050232.81 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:40:54,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.862e+01 9.365e+01 1.011e+02 1.202e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 17:40:56,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.75 vs. limit=6.0 2023-11-26 17:41:11,075 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524850 2023-11-26 17:41:34,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.81 vs. limit=10.0 2023-11-26 17:41:36,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3499106.6666666665, ans=0.0 2023-11-26 17:41:40,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3499106.6666666665, ans=0.1 2023-11-26 17:41:41,883 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7850, loss[loss=0.0736, simple_loss=0.09736, pruned_loss=0.01395, audio_tagging_loss=0.01096, over 15081.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08985, pruned_loss=0.01228, audio_tagging_loss=0.008821, over 3054937.57 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:41:46,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3499173.3333333335, ans=15.0 2023-11-26 17:41:50,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3499173.3333333335, ans=0.0 2023-11-26 17:41:54,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3499240.0, ans=0.1 2023-11-26 17:42:06,275 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524900 2023-11-26 17:42:23,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3499373.3333333335, ans=0.1 2023-11-26 17:42:28,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3499440.0, ans=0.125 2023-11-26 17:42:36,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3499440.0, ans=0.125 2023-11-26 17:42:38,159 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7900, loss[loss=0.06655, simple_loss=0.0929, pruned_loss=0.01247, audio_tagging_loss=0.007633, over 15913.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09016, pruned_loss=0.01225, audio_tagging_loss=0.008817, over 3052631.38 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:42:40,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3499506.6666666665, ans=0.2 2023-11-26 17:42:44,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3499506.6666666665, ans=0.125 2023-11-26 17:42:46,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 9.062e+01 9.709e+01 1.039e+02 1.444e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-26 17:42:47,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3499573.3333333335, ans=0.0 2023-11-26 17:42:47,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3499573.3333333335, ans=0.0 2023-11-26 17:42:49,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3499573.3333333335, ans=0.2 2023-11-26 17:42:51,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3499573.3333333335, ans=0.1 2023-11-26 17:42:58,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3499640.0, ans=0.2 2023-11-26 17:43:00,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3499640.0, ans=0.0 2023-11-26 17:43:01,416 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524950 2023-11-26 17:43:06,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3499640.0, ans=0.2 2023-11-26 17:43:07,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3499640.0, ans=0.1 2023-11-26 17:43:13,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3499706.6666666665, ans=0.0 2023-11-26 17:43:24,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3499773.3333333335, ans=0.0 2023-11-26 17:43:29,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-11-26 17:43:33,418 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7950, loss[loss=0.07569, simple_loss=0.1061, pruned_loss=0.01538, audio_tagging_loss=0.007268, over 16136.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09095, pruned_loss=0.01251, audio_tagging_loss=0.008827, over 3056988.16 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:43:34,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3499840.0, ans=0.0 2023-11-26 17:43:49,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-26 17:43:50,560 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:43:59,131 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525000 2023-11-26 17:44:00,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3499973.3333333335, ans=0.125 2023-11-26 17:44:24,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3500106.6666666665, ans=0.2 2023-11-26 17:44:28,916 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8000, loss[loss=0.07377, simple_loss=0.09728, pruned_loss=0.01827, audio_tagging_loss=0.006867, over 14408.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09044, pruned_loss=0.01259, audio_tagging_loss=0.008876, over 3047845.41 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:44:30,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2023-11-26 17:44:38,611 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.982e+01 9.655e+01 1.047e+02 1.497e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-26 17:44:44,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3500240.0, ans=0.2 2023-11-26 17:44:46,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.75 vs. limit=15.0 2023-11-26 17:44:54,189 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525050 2023-11-26 17:45:20,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3500440.0, ans=0.125 2023-11-26 17:45:25,875 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8050, loss[loss=0.07954, simple_loss=0.1031, pruned_loss=0.01752, audio_tagging_loss=0.01049, over 15230.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09024, pruned_loss=0.01256, audio_tagging_loss=0.008993, over 3049052.46 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:45:32,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3500506.6666666665, ans=0.0 2023-11-26 17:45:34,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-26 17:45:34,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-26 17:45:49,206 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525100 2023-11-26 17:45:54,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3500640.0, ans=0.125 2023-11-26 17:45:56,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3500640.0, ans=0.0 2023-11-26 17:46:11,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2023-11-26 17:46:21,175 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8100, loss[loss=0.07365, simple_loss=0.1006, pruned_loss=0.01508, audio_tagging_loss=0.008288, over 14957.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09136, pruned_loss=0.01273, audio_tagging_loss=0.008898, over 3051787.44 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:46:25,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3500840.0, ans=0.0 2023-11-26 17:46:29,608 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.964e+01 9.424e+01 1.017e+02 1.199e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 17:46:44,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3500973.3333333335, ans=0.0 2023-11-26 17:46:45,717 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525150 2023-11-26 17:46:49,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-26 17:46:59,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3501040.0, ans=0.0 2023-11-26 17:47:16,040 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8150, loss[loss=0.06653, simple_loss=0.08505, pruned_loss=0.0142, audio_tagging_loss=0.009796, over 15051.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09115, pruned_loss=0.01277, audio_tagging_loss=0.008769, over 3054517.44 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:47:17,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3501173.3333333335, ans=0.1 2023-11-26 17:47:39,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2023-11-26 17:47:40,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3501306.6666666665, ans=0.04949747468305833 2023-11-26 17:47:41,619 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525200 2023-11-26 17:47:42,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3501306.6666666665, ans=0.0 2023-11-26 17:47:53,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3501373.3333333335, ans=0.125 2023-11-26 17:47:57,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3501373.3333333335, ans=0.0 2023-11-26 17:48:13,445 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8200, loss[loss=0.0505, simple_loss=0.05805, pruned_loss=0.01062, audio_tagging_loss=0.01086, over 14040.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09008, pruned_loss=0.01254, audio_tagging_loss=0.008682, over 3053251.74 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:48:14,224 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2023-11-26 17:48:16,603 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:48:21,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3501506.6666666665, ans=0.0 2023-11-26 17:48:22,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.766e+01 9.406e+01 1.001e+02 1.239e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 17:48:25,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3501573.3333333335, ans=0.1 2023-11-26 17:48:36,835 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525250 2023-11-26 17:48:38,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-26 17:48:38,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-11-26 17:49:00,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3501773.3333333335, ans=0.0 2023-11-26 17:49:08,675 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8250, loss[loss=0.08044, simple_loss=0.1205, pruned_loss=0.0146, audio_tagging_loss=0.005615, over 15155.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08998, pruned_loss=0.01247, audio_tagging_loss=0.008617, over 3056588.72 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:49:12,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3501840.0, ans=0.125 2023-11-26 17:49:30,653 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:49:32,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3501973.3333333335, ans=0.0 2023-11-26 17:49:33,198 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525300 2023-11-26 17:49:39,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.11 vs. limit=15.0 2023-11-26 17:49:51,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3502040.0, ans=0.0 2023-11-26 17:49:53,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3502106.6666666665, ans=0.125 2023-11-26 17:50:03,820 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8300, loss[loss=0.05971, simple_loss=0.08726, pruned_loss=0.006557, audio_tagging_loss=0.009521, over 15842.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08964, pruned_loss=0.01233, audio_tagging_loss=0.008577, over 3059535.88 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:50:14,244 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.963e+01 9.594e+01 1.033e+02 1.265e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 17:50:17,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2023-11-26 17:50:28,595 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525350 2023-11-26 17:50:37,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3502373.3333333335, ans=0.0 2023-11-26 17:50:37,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.56 vs. limit=15.0 2023-11-26 17:50:40,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3502373.3333333335, ans=0.125 2023-11-26 17:50:59,497 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8350, loss[loss=0.05188, simple_loss=0.0674, pruned_loss=0.009323, audio_tagging_loss=0.00885, over 14942.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08934, pruned_loss=0.01223, audio_tagging_loss=0.00858, over 3060005.24 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:51:06,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3502506.6666666665, ans=0.0 2023-11-26 17:51:23,989 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525400 2023-11-26 17:51:30,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3502640.0, ans=0.0 2023-11-26 17:51:40,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3502706.6666666665, ans=0.125 2023-11-26 17:51:55,980 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8400, loss[loss=0.09139, simple_loss=0.1248, pruned_loss=0.02014, audio_tagging_loss=0.008863, over 16272.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09028, pruned_loss=0.01233, audio_tagging_loss=0.008472, over 3059155.55 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:52:05,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.821e+01 9.539e+01 1.045e+02 1.757e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 17:52:09,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3502906.6666666665, ans=0.0 2023-11-26 17:52:19,648 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525450 2023-11-26 17:52:50,786 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8450, loss[loss=0.06168, simple_loss=0.08111, pruned_loss=0.0112, audio_tagging_loss=0.009926, over 15061.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09023, pruned_loss=0.01248, audio_tagging_loss=0.008573, over 3057586.09 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:52:57,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3503173.3333333335, ans=0.1 2023-11-26 17:53:11,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2023-11-26 17:53:15,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-26 17:53:16,160 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525500 2023-11-26 17:53:16,320 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:53:18,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-26 17:53:19,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3503306.6666666665, ans=0.1 2023-11-26 17:53:36,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3503440.0, ans=0.0 2023-11-26 17:53:46,680 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8500, loss[loss=0.07694, simple_loss=0.1044, pruned_loss=0.01528, audio_tagging_loss=0.009438, over 15328.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09104, pruned_loss=0.01254, audio_tagging_loss=0.008617, over 3058037.22 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:53:49,043 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:53:58,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.914e+01 9.480e+01 1.019e+02 1.218e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 17:54:10,870 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525550 2023-11-26 17:54:23,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3503706.6666666665, ans=0.1 2023-11-26 17:54:24,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3503706.6666666665, ans=0.0 2023-11-26 17:54:34,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3503773.3333333335, ans=0.1 2023-11-26 17:54:42,936 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8550, loss[loss=0.06778, simple_loss=0.08808, pruned_loss=0.01338, audio_tagging_loss=0.01037, over 15033.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09094, pruned_loss=0.01256, audio_tagging_loss=0.00867, over 3052473.32 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:54:47,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2023-11-26 17:54:51,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3503840.0, ans=0.125 2023-11-26 17:54:56,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3503906.6666666665, ans=0.1 2023-11-26 17:54:59,263 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-26 17:54:59,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=12.0 2023-11-26 17:55:06,691 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525600 2023-11-26 17:55:09,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=12.0 2023-11-26 17:55:17,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3504040.0, ans=0.2 2023-11-26 17:55:37,975 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8600, loss[loss=0.06796, simple_loss=0.09544, pruned_loss=0.01378, audio_tagging_loss=0.006466, over 15358.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08957, pruned_loss=0.01239, audio_tagging_loss=0.008756, over 3052139.16 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:55:49,034 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.808e+01 9.575e+01 1.022e+02 1.354e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 17:56:02,968 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525650 2023-11-26 17:56:07,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3504306.6666666665, ans=10.0 2023-11-26 17:56:15,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3504373.3333333335, ans=0.2 2023-11-26 17:56:33,762 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8650, loss[loss=0.06799, simple_loss=0.08678, pruned_loss=0.01504, audio_tagging_loss=0.009563, over 15597.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08962, pruned_loss=0.01231, audio_tagging_loss=0.008796, over 3049623.55 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:56:35,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3504506.6666666665, ans=0.125 2023-11-26 17:56:58,844 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525700 2023-11-26 17:57:23,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3504773.3333333335, ans=0.09899494936611666 2023-11-26 17:57:30,011 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8700, loss[loss=0.05874, simple_loss=0.08447, pruned_loss=0.008202, audio_tagging_loss=0.008299, over 15655.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08959, pruned_loss=0.01211, audio_tagging_loss=0.008903, over 3051170.63 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:57:41,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.922e+01 9.364e+01 9.934e+01 1.569e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 17:57:53,934 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525750 2023-11-26 17:58:25,616 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8750, loss[loss=0.06816, simple_loss=0.09091, pruned_loss=0.01379, audio_tagging_loss=0.008911, over 15231.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09058, pruned_loss=0.01227, audio_tagging_loss=0.008917, over 3053496.38 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:58:27,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3505173.3333333335, ans=0.2 2023-11-26 17:58:29,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3505173.3333333335, ans=0.125 2023-11-26 17:58:50,408 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525800 2023-11-26 17:59:21,326 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8800, loss[loss=0.05267, simple_loss=0.06975, pruned_loss=0.007618, audio_tagging_loss=0.01018, over 14420.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09073, pruned_loss=0.01229, audio_tagging_loss=0.008924, over 3049028.74 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:59:32,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.136e+01 9.670e+01 1.038e+02 1.622e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 17:59:45,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525850 2023-11-26 17:59:49,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3505640.0, ans=0.125 2023-11-26 17:59:50,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3505640.0, ans=0.07 2023-11-26 17:59:52,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3505640.0, ans=0.04949747468305833 2023-11-26 18:00:04,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2023-11-26 18:00:06,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3505773.3333333335, ans=0.0 2023-11-26 18:00:17,404 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8850, loss[loss=0.06332, simple_loss=0.08794, pruned_loss=0.009666, audio_tagging_loss=0.009688, over 14325.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09023, pruned_loss=0.01214, audio_tagging_loss=0.008915, over 3051732.35 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:00:30,096 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:00:36,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=3505906.6666666665, ans=15.0 2023-11-26 18:00:41,265 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525900 2023-11-26 18:00:42,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3505973.3333333335, ans=0.125 2023-11-26 18:00:43,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3505973.3333333335, ans=0.125 2023-11-26 18:00:47,124 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:01:01,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=22.5 2023-11-26 18:01:03,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3506106.6666666665, ans=0.125 2023-11-26 18:01:12,212 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8900, loss[loss=0.07397, simple_loss=0.1038, pruned_loss=0.01301, audio_tagging_loss=0.009091, over 15204.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09082, pruned_loss=0.0122, audio_tagging_loss=0.008795, over 3057958.09 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:01:16,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3506173.3333333335, ans=0.125 2023-11-26 18:01:24,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.724e+01 9.341e+01 9.971e+01 1.167e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:01:27,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3506240.0, ans=0.1 2023-11-26 18:01:30,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3506240.0, ans=0.1 2023-11-26 18:01:37,696 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525950 2023-11-26 18:01:59,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506440.0, ans=0.125 2023-11-26 18:02:00,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506440.0, ans=0.125 2023-11-26 18:02:05,152 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=10.0 2023-11-26 18:02:07,761 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8950, loss[loss=0.05164, simple_loss=0.07545, pruned_loss=0.006187, audio_tagging_loss=0.007728, over 14910.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09003, pruned_loss=0.01212, audio_tagging_loss=0.00866, over 3058910.84 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:02:14,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3506506.6666666665, ans=0.2 2023-11-26 18:02:27,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3506573.3333333335, ans=0.1 2023-11-26 18:02:27,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3506573.3333333335, ans=0.125 2023-11-26 18:02:32,637 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526000 2023-11-26 18:02:39,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3506640.0, ans=0.2 2023-11-26 18:03:04,588 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9000, loss[loss=0.05108, simple_loss=0.06482, pruned_loss=0.007986, audio_tagging_loss=0.01068, over 14243.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08979, pruned_loss=0.01208, audio_tagging_loss=0.008589, over 3056144.36 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:03:04,589 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 18:03:36,869 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05857, simple_loss=0.05054, pruned_loss=0.005271, audio_tagging_loss=0.02803, over 4681554.00 frames. 2023-11-26 18:03:36,869 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 18:03:46,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2023-11-26 18:03:50,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 9.015e+01 9.647e+01 1.018e+02 1.400e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 18:03:56,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3506906.6666666665, ans=0.125 2023-11-26 18:04:02,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526050 2023-11-26 18:04:08,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3506973.3333333335, ans=0.125 2023-11-26 18:04:32,807 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9050, loss[loss=0.05605, simple_loss=0.07224, pruned_loss=0.01036, audio_tagging_loss=0.009573, over 15800.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09124, pruned_loss=0.01237, audio_tagging_loss=0.008421, over 3057911.41 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:04:43,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3507240.0, ans=0.125 2023-11-26 18:04:45,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3507240.0, ans=0.125 2023-11-26 18:04:57,321 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526100 2023-11-26 18:05:09,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3507373.3333333335, ans=0.0 2023-11-26 18:05:10,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3507373.3333333335, ans=0.0 2023-11-26 18:05:11,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3507373.3333333335, ans=0.125 2023-11-26 18:05:13,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3507373.3333333335, ans=0.0 2023-11-26 18:05:24,412 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=12.0 2023-11-26 18:05:29,288 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9100, loss[loss=0.05832, simple_loss=0.08511, pruned_loss=0.008986, audio_tagging_loss=0.00678, over 14961.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09099, pruned_loss=0.01226, audio_tagging_loss=0.008408, over 3056322.71 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:05:41,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3507573.3333333335, ans=0.2 2023-11-26 18:05:41,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.875e+01 9.431e+01 1.015e+02 1.268e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 18:05:53,185 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526150 2023-11-26 18:05:57,113 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:06:06,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3507706.6666666665, ans=10.0 2023-11-26 18:06:13,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3507773.3333333335, ans=0.2 2023-11-26 18:06:14,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3507773.3333333335, ans=0.125 2023-11-26 18:06:24,509 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9150, loss[loss=0.06168, simple_loss=0.07861, pruned_loss=0.01295, audio_tagging_loss=0.009422, over 15562.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.0907, pruned_loss=0.01233, audio_tagging_loss=0.008502, over 3051326.09 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:06:41,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3507906.6666666665, ans=0.0 2023-11-26 18:06:50,131 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526200 2023-11-26 18:07:01,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3508040.0, ans=0.125 2023-11-26 18:07:13,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3508106.6666666665, ans=0.125 2023-11-26 18:07:20,632 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9200, loss[loss=0.05514, simple_loss=0.0803, pruned_loss=0.007521, audio_tagging_loss=0.007466, over 16073.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09039, pruned_loss=0.01226, audio_tagging_loss=0.008555, over 3045320.71 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:07:26,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3508173.3333333335, ans=0.125 2023-11-26 18:07:34,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.887e+01 9.428e+01 1.018e+02 1.949e+02, threshold=1.886e+02, percent-clipped=1.0 2023-11-26 18:07:45,637 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526250 2023-11-26 18:07:50,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2023-11-26 18:07:57,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3508373.3333333335, ans=0.09899494936611666 2023-11-26 18:08:06,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3508440.0, ans=0.125 2023-11-26 18:08:17,407 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9250, loss[loss=0.06527, simple_loss=0.08215, pruned_loss=0.01185, audio_tagging_loss=0.01235, over 15744.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09085, pruned_loss=0.01235, audio_tagging_loss=0.008537, over 3050134.20 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:08:18,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3508506.6666666665, ans=0.1 2023-11-26 18:08:20,174 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2023-11-26 18:08:40,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-11-26 18:08:40,780 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526300 2023-11-26 18:08:40,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3508640.0, ans=0.05 2023-11-26 18:08:42,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3508640.0, ans=0.0 2023-11-26 18:08:45,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3508640.0, ans=0.125 2023-11-26 18:08:47,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-26 18:09:05,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3508773.3333333335, ans=0.2 2023-11-26 18:09:12,312 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9300, loss[loss=0.08511, simple_loss=0.1245, pruned_loss=0.01795, audio_tagging_loss=0.004903, over 16045.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08983, pruned_loss=0.01225, audio_tagging_loss=0.008559, over 3046760.71 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:09:15,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3508840.0, ans=0.1 2023-11-26 18:09:25,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.680e+01 9.500e+01 1.023e+02 1.279e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 18:09:29,037 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:09:31,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3508906.6666666665, ans=0.125 2023-11-26 18:09:37,822 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526350 2023-11-26 18:09:46,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3509040.0, ans=0.125 2023-11-26 18:09:55,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2023-11-26 18:09:57,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3509106.6666666665, ans=0.0 2023-11-26 18:10:06,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3509173.3333333335, ans=0.2 2023-11-26 18:10:07,396 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9350, loss[loss=0.05791, simple_loss=0.08042, pruned_loss=0.01101, audio_tagging_loss=0.006686, over 16399.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08887, pruned_loss=0.01215, audio_tagging_loss=0.008591, over 3038096.56 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:10:07,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=12.0 2023-11-26 18:10:32,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3509306.6666666665, ans=0.2 2023-11-26 18:10:33,015 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526400 2023-11-26 18:10:39,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3509306.6666666665, ans=0.125 2023-11-26 18:10:50,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3509373.3333333335, ans=0.2 2023-11-26 18:10:50,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3509373.3333333335, ans=0.125 2023-11-26 18:11:04,588 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9400, loss[loss=0.07425, simple_loss=0.1092, pruned_loss=0.01258, audio_tagging_loss=0.007087, over 15747.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08853, pruned_loss=0.0121, audio_tagging_loss=0.008675, over 3037628.36 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:11:17,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.876e+01 9.519e+01 1.023e+02 1.284e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 18:11:17,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3509573.3333333335, ans=0.125 2023-11-26 18:11:20,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3509573.3333333335, ans=0.2 2023-11-26 18:11:27,908 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526450 2023-11-26 18:11:36,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=22.5 2023-11-26 18:11:39,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-26 18:11:47,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3509706.6666666665, ans=0.2 2023-11-26 18:11:49,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509773.3333333335, ans=0.1 2023-11-26 18:11:59,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-26 18:11:59,794 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9450, loss[loss=0.07313, simple_loss=0.1056, pruned_loss=0.01414, audio_tagging_loss=0.006184, over 15852.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08904, pruned_loss=0.01215, audio_tagging_loss=0.0087, over 3041619.78 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:12:00,893 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:12:13,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3509906.6666666665, ans=0.125 2023-11-26 18:12:16,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3509906.6666666665, ans=0.0 2023-11-26 18:12:24,995 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526500 2023-11-26 18:12:29,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3509973.3333333335, ans=0.125 2023-11-26 18:12:49,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3510106.6666666665, ans=0.025 2023-11-26 18:12:55,192 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9500, loss[loss=0.06727, simple_loss=0.0995, pruned_loss=0.01031, audio_tagging_loss=0.007206, over 15482.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.0904, pruned_loss=0.01257, audio_tagging_loss=0.008654, over 3041785.83 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:13:06,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3510240.0, ans=0.2 2023-11-26 18:13:09,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.837e+01 9.537e+01 1.034e+02 1.299e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 18:13:20,779 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526550 2023-11-26 18:13:21,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3510306.6666666665, ans=0.125 2023-11-26 18:13:48,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3510440.0, ans=0.0 2023-11-26 18:13:51,806 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9550, loss[loss=0.04446, simple_loss=0.05254, pruned_loss=0.007681, audio_tagging_loss=0.01051, over 14130.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08976, pruned_loss=0.01251, audio_tagging_loss=0.008757, over 3035321.34 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:13:52,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=12.0 2023-11-26 18:14:06,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3510573.3333333335, ans=0.0 2023-11-26 18:14:10,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3510573.3333333335, ans=0.2 2023-11-26 18:14:15,670 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526600 2023-11-26 18:14:26,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3510706.6666666665, ans=0.0 2023-11-26 18:14:45,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3510773.3333333335, ans=0.125 2023-11-26 18:14:47,796 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9600, loss[loss=0.06072, simple_loss=0.08406, pruned_loss=0.009975, audio_tagging_loss=0.00871, over 14771.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08871, pruned_loss=0.01226, audio_tagging_loss=0.008937, over 3037157.16 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:14:52,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2023-11-26 18:15:00,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.909e+01 9.426e+01 1.006e+02 1.618e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 18:15:11,654 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526650 2023-11-26 18:15:16,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3510973.3333333335, ans=0.0 2023-11-26 18:15:29,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3511040.0, ans=0.125 2023-11-26 18:15:31,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3511106.6666666665, ans=10.0 2023-11-26 18:15:43,132 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9650, loss[loss=0.08364, simple_loss=0.1191, pruned_loss=0.01604, audio_tagging_loss=0.00804, over 15935.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08808, pruned_loss=0.0122, audio_tagging_loss=0.008946, over 3038264.29 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:15:45,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2023-11-26 18:15:59,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2023-11-26 18:16:05,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3511306.6666666665, ans=0.125 2023-11-26 18:16:08,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526700 2023-11-26 18:16:40,143 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9700, loss[loss=0.07356, simple_loss=0.1001, pruned_loss=0.01744, audio_tagging_loss=0.006047, over 14936.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08959, pruned_loss=0.01232, audio_tagging_loss=0.00872, over 3039395.83 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:16:54,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.934e+01 9.029e+01 9.553e+01 1.029e+02 1.538e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 18:17:03,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3511640.0, ans=0.125 2023-11-26 18:17:04,057 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526750 2023-11-26 18:17:35,710 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9750, loss[loss=0.06951, simple_loss=0.1, pruned_loss=0.01208, audio_tagging_loss=0.007407, over 15951.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09009, pruned_loss=0.01224, audio_tagging_loss=0.008549, over 3044112.00 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:17:42,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.06 vs. limit=15.0 2023-11-26 18:17:45,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3511906.6666666665, ans=0.125 2023-11-26 18:17:49,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3511906.6666666665, ans=0.05 2023-11-26 18:17:57,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3511973.3333333335, ans=0.0 2023-11-26 18:17:59,897 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526800 2023-11-26 18:18:02,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3511973.3333333335, ans=0.125 2023-11-26 18:18:05,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3511973.3333333335, ans=0.0 2023-11-26 18:18:07,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3511973.3333333335, ans=0.05 2023-11-26 18:18:23,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3512106.6666666665, ans=0.02 2023-11-26 18:18:29,227 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2023-11-26 18:18:31,592 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9800, loss[loss=0.06215, simple_loss=0.08574, pruned_loss=0.01049, audio_tagging_loss=0.008791, over 16453.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08956, pruned_loss=0.01221, audio_tagging_loss=0.008499, over 3046081.99 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:18:40,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=15.0 2023-11-26 18:18:42,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3512240.0, ans=0.1 2023-11-26 18:18:45,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.863e+01 9.415e+01 1.026e+02 1.443e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 18:18:46,091 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:18:47,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3512240.0, ans=0.0 2023-11-26 18:18:56,665 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526850 2023-11-26 18:19:02,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3512306.6666666665, ans=0.125 2023-11-26 18:19:09,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=15.0 2023-11-26 18:19:09,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3512373.3333333335, ans=0.125 2023-11-26 18:19:16,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3512440.0, ans=0.125 2023-11-26 18:19:22,547 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:19:27,304 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9850, loss[loss=0.06291, simple_loss=0.08382, pruned_loss=0.01119, audio_tagging_loss=0.009808, over 15902.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08989, pruned_loss=0.0122, audio_tagging_loss=0.008517, over 3041250.69 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:19:28,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3512506.6666666665, ans=0.1 2023-11-26 18:19:40,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3512573.3333333335, ans=0.0 2023-11-26 18:19:50,252 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:19:52,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526900 2023-11-26 18:19:57,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3512640.0, ans=0.0 2023-11-26 18:20:03,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2023-11-26 18:20:18,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2023-11-26 18:20:19,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3512773.3333333335, ans=0.0 2023-11-26 18:20:23,494 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9900, loss[loss=0.06417, simple_loss=0.08669, pruned_loss=0.01265, audio_tagging_loss=0.008178, over 15638.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09035, pruned_loss=0.01233, audio_tagging_loss=0.008479, over 3046156.64 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:20:37,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.893e+01 9.553e+01 1.033e+02 1.550e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 18:20:47,282 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526950 2023-11-26 18:20:53,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-11-26 18:20:56,683 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-26 18:21:05,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3513040.0, ans=0.0 2023-11-26 18:21:07,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3513106.6666666665, ans=0.125 2023-11-26 18:21:19,079 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9950, loss[loss=0.07691, simple_loss=0.1118, pruned_loss=0.01468, audio_tagging_loss=0.006334, over 13906.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08899, pruned_loss=0.01196, audio_tagging_loss=0.008518, over 3043178.04 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:21:33,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3513240.0, ans=0.07 2023-11-26 18:21:44,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527000 2023-11-26 18:22:10,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3513440.0, ans=0.2 2023-11-26 18:22:15,650 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10000, loss[loss=0.05735, simple_loss=0.07655, pruned_loss=0.009588, audio_tagging_loss=0.009485, over 15654.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.089, pruned_loss=0.01193, audio_tagging_loss=0.008529, over 3038364.78 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:22:26,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3513573.3333333335, ans=0.125 2023-11-26 18:22:28,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2023-11-26 18:22:29,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.698e+01 9.339e+01 1.006e+02 1.184e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:22:39,666 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527050 2023-11-26 18:22:39,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3513640.0, ans=0.025 2023-11-26 18:22:45,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3513640.0, ans=0.0 2023-11-26 18:22:51,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3513706.6666666665, ans=0.125 2023-11-26 18:23:04,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3513773.3333333335, ans=0.125 2023-11-26 18:23:04,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3513773.3333333335, ans=0.0 2023-11-26 18:23:06,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3513773.3333333335, ans=0.2 2023-11-26 18:23:11,580 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10050, loss[loss=0.05495, simple_loss=0.0665, pruned_loss=0.01077, audio_tagging_loss=0.01093, over 14441.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08956, pruned_loss=0.01206, audio_tagging_loss=0.008504, over 3042491.12 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:23:30,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3513906.6666666665, ans=0.2 2023-11-26 18:23:35,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527100 2023-11-26 18:23:39,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3513973.3333333335, ans=0.0 2023-11-26 18:23:51,286 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:23:54,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3514040.0, ans=0.1 2023-11-26 18:24:04,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3514106.6666666665, ans=0.125 2023-11-26 18:24:06,815 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10100, loss[loss=0.08217, simple_loss=0.107, pruned_loss=0.01903, audio_tagging_loss=0.009652, over 15577.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08952, pruned_loss=0.01209, audio_tagging_loss=0.008602, over 3046282.61 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:24:21,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.926e+01 9.577e+01 1.044e+02 1.166e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 18:24:22,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2023-11-26 18:24:32,287 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527150 2023-11-26 18:24:44,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-26 18:24:53,890 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:24:54,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3514440.0, ans=0.0 2023-11-26 18:24:56,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3514440.0, ans=0.125 2023-11-26 18:25:02,354 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10150, loss[loss=0.06058, simple_loss=0.08667, pruned_loss=0.009738, audio_tagging_loss=0.007509, over 15584.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08962, pruned_loss=0.01211, audio_tagging_loss=0.008719, over 3048766.87 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:25:25,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3514640.0, ans=0.2 2023-11-26 18:25:26,972 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527200 2023-11-26 18:25:27,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-26 18:25:30,458 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:25:33,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3514640.0, ans=0.02 2023-11-26 18:25:43,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3514706.6666666665, ans=0.0 2023-11-26 18:25:46,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3514773.3333333335, ans=0.2 2023-11-26 18:25:58,639 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10200, loss[loss=0.05239, simple_loss=0.06151, pruned_loss=0.009085, audio_tagging_loss=0.01255, over 14047.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09003, pruned_loss=0.01215, audio_tagging_loss=0.008758, over 3048371.56 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:26:09,684 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2023-11-26 18:26:12,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3514906.6666666665, ans=0.0 2023-11-26 18:26:13,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 9.101e+01 9.803e+01 1.043e+02 1.180e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-26 18:26:15,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3514906.6666666665, ans=0.1 2023-11-26 18:26:21,429 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:26:22,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527250 2023-11-26 18:26:39,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3515040.0, ans=0.2 2023-11-26 18:26:53,043 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10250, loss[loss=0.06921, simple_loss=0.08868, pruned_loss=0.01452, audio_tagging_loss=0.01035, over 15267.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08961, pruned_loss=0.01214, audio_tagging_loss=0.008819, over 3047078.68 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:26:58,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3515173.3333333335, ans=0.07 2023-11-26 18:27:08,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2023-11-26 18:27:18,156 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527300 2023-11-26 18:27:24,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3515306.6666666665, ans=0.1 2023-11-26 18:27:26,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3515373.3333333335, ans=0.125 2023-11-26 18:27:32,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3515373.3333333335, ans=0.2 2023-11-26 18:27:35,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2023-11-26 18:27:38,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3515440.0, ans=0.125 2023-11-26 18:27:45,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3515440.0, ans=0.125 2023-11-26 18:27:48,561 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10300, loss[loss=0.05542, simple_loss=0.08042, pruned_loss=0.006681, audio_tagging_loss=0.008525, over 14521.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08974, pruned_loss=0.01224, audio_tagging_loss=0.008744, over 3040417.94 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:27:49,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3515506.6666666665, ans=0.125 2023-11-26 18:27:54,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3515506.6666666665, ans=10.0 2023-11-26 18:28:01,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-26 18:28:05,633 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.902e+01 9.462e+01 1.025e+02 1.207e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 18:28:12,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3515640.0, ans=0.125 2023-11-26 18:28:13,120 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527350 2023-11-26 18:28:20,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3515706.6666666665, ans=22.5 2023-11-26 18:28:33,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3515773.3333333335, ans=0.0 2023-11-26 18:28:36,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=22.5 2023-11-26 18:28:36,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3515773.3333333335, ans=0.0 2023-11-26 18:28:45,122 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10350, loss[loss=0.09406, simple_loss=0.1306, pruned_loss=0.02275, audio_tagging_loss=0.005988, over 15349.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09123, pruned_loss=0.01253, audio_tagging_loss=0.008749, over 3049917.94 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:29:01,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3515906.6666666665, ans=0.0 2023-11-26 18:29:08,695 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527400 2023-11-26 18:29:40,531 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10400, loss[loss=0.06243, simple_loss=0.08338, pruned_loss=0.01083, audio_tagging_loss=0.00991, over 15889.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09018, pruned_loss=0.01238, audio_tagging_loss=0.008893, over 3045807.53 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:29:54,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=15.0 2023-11-26 18:29:57,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.067e+01 9.465e+01 1.042e+02 1.301e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 18:29:59,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3516240.0, ans=0.0 2023-11-26 18:30:05,658 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527450 2023-11-26 18:30:08,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-11-26 18:30:11,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3516306.6666666665, ans=0.0 2023-11-26 18:30:18,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.03 vs. limit=10.0 2023-11-26 18:30:26,528 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:30:35,792 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10450, loss[loss=0.08186, simple_loss=0.1082, pruned_loss=0.01866, audio_tagging_loss=0.009086, over 15525.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08992, pruned_loss=0.01229, audio_tagging_loss=0.008845, over 3047232.51 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:30:43,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=12.0 2023-11-26 18:30:58,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3516640.0, ans=0.125 2023-11-26 18:31:00,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3516640.0, ans=0.125 2023-11-26 18:31:01,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527500 2023-11-26 18:31:05,732 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:31:06,802 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:31:11,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3516706.6666666665, ans=0.125 2023-11-26 18:31:24,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3516773.3333333335, ans=0.0 2023-11-26 18:31:25,886 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:31:30,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3516773.3333333335, ans=0.0 2023-11-26 18:31:30,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3516773.3333333335, ans=0.125 2023-11-26 18:31:33,189 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10500, loss[loss=0.06351, simple_loss=0.0757, pruned_loss=0.01608, audio_tagging_loss=0.009584, over 15063.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09028, pruned_loss=0.01248, audio_tagging_loss=0.0087, over 3045542.82 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:31:35,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-11-26 18:31:48,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.975e+01 9.604e+01 1.035e+02 1.568e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-26 18:31:56,506 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527550 2023-11-26 18:31:57,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3516973.3333333335, ans=0.125 2023-11-26 18:32:17,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2023-11-26 18:32:27,953 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10550, loss[loss=0.0581, simple_loss=0.08385, pruned_loss=0.007987, audio_tagging_loss=0.008184, over 15613.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09008, pruned_loss=0.01241, audio_tagging_loss=0.008593, over 3037493.10 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:32:28,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3517173.3333333335, ans=0.025 2023-11-26 18:32:33,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3517173.3333333335, ans=0.125 2023-11-26 18:32:38,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3517240.0, ans=0.2 2023-11-26 18:32:43,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-26 18:32:52,363 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527600 2023-11-26 18:33:04,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-26 18:33:07,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3517373.3333333335, ans=22.5 2023-11-26 18:33:15,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3517440.0, ans=0.2 2023-11-26 18:33:16,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3517440.0, ans=0.2 2023-11-26 18:33:23,267 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10600, loss[loss=0.06142, simple_loss=0.08625, pruned_loss=0.009992, audio_tagging_loss=0.008307, over 15048.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08981, pruned_loss=0.01235, audio_tagging_loss=0.008531, over 3032886.55 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:33:30,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-26 18:33:41,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.748e+01 9.260e+01 9.948e+01 1.249e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 18:33:49,073 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527650 2023-11-26 18:33:57,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3517706.6666666665, ans=0.0 2023-11-26 18:34:13,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3517773.3333333335, ans=0.125 2023-11-26 18:34:20,607 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10650, loss[loss=0.06938, simple_loss=0.09738, pruned_loss=0.01582, audio_tagging_loss=0.004864, over 15468.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09033, pruned_loss=0.01249, audio_tagging_loss=0.008444, over 3034958.21 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:34:44,376 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527700 2023-11-26 18:34:46,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3517973.3333333335, ans=0.125 2023-11-26 18:35:02,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3518040.0, ans=0.1 2023-11-26 18:35:07,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3518106.6666666665, ans=0.0 2023-11-26 18:35:16,066 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10700, loss[loss=0.09326, simple_loss=0.1351, pruned_loss=0.01911, audio_tagging_loss=0.006625, over 15946.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.0901, pruned_loss=0.01236, audio_tagging_loss=0.008523, over 3036065.19 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:35:20,503 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:35:21,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2023-11-26 18:35:32,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.938e+01 9.482e+01 1.003e+02 1.497e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 18:35:40,330 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527750 2023-11-26 18:35:50,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3518373.3333333335, ans=0.1 2023-11-26 18:35:51,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3518373.3333333335, ans=0.125 2023-11-26 18:35:53,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3518373.3333333335, ans=0.125 2023-11-26 18:36:03,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3518440.0, ans=0.125 2023-11-26 18:36:11,557 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10750, loss[loss=0.0658, simple_loss=0.09011, pruned_loss=0.01233, audio_tagging_loss=0.008419, over 14254.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09007, pruned_loss=0.01222, audio_tagging_loss=0.008478, over 3037541.68 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:36:16,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3518506.6666666665, ans=0.125 2023-11-26 18:36:16,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3518506.6666666665, ans=0.125 2023-11-26 18:36:16,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-26 18:36:19,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3518506.6666666665, ans=0.125 2023-11-26 18:36:27,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3518573.3333333335, ans=0.125 2023-11-26 18:36:30,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3518573.3333333335, ans=0.07 2023-11-26 18:36:37,333 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527800 2023-11-26 18:37:08,282 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10800, loss[loss=0.05819, simple_loss=0.08789, pruned_loss=0.008549, audio_tagging_loss=0.005697, over 14724.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09092, pruned_loss=0.01244, audio_tagging_loss=0.008371, over 3044928.56 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:37:12,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3518840.0, ans=0.1 2023-11-26 18:37:22,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3518906.6666666665, ans=0.0 2023-11-26 18:37:25,433 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.760e+01 9.363e+01 9.951e+01 1.169e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 18:37:33,083 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527850 2023-11-26 18:37:45,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.15 vs. limit=22.5 2023-11-26 18:38:04,873 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10850, loss[loss=0.05629, simple_loss=0.06851, pruned_loss=0.01072, audio_tagging_loss=0.01132, over 14445.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09152, pruned_loss=0.01262, audio_tagging_loss=0.008462, over 3043415.09 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:38:08,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3519173.3333333335, ans=0.0 2023-11-26 18:38:26,386 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:38:28,368 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527900 2023-11-26 18:38:44,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3519373.3333333335, ans=0.125 2023-11-26 18:38:48,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3519440.0, ans=0.035 2023-11-26 18:38:50,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3519440.0, ans=0.2 2023-11-26 18:38:59,181 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:39:00,247 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10900, loss[loss=0.05593, simple_loss=0.07547, pruned_loss=0.008239, audio_tagging_loss=0.009959, over 15248.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09109, pruned_loss=0.0125, audio_tagging_loss=0.008617, over 3044145.00 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:39:18,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.701e+01 9.472e+01 1.014e+02 1.998e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 18:39:25,379 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527950 2023-11-26 18:39:42,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2023-11-26 18:39:43,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3519773.3333333335, ans=0.125 2023-11-26 18:39:55,872 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10950, loss[loss=0.05458, simple_loss=0.07332, pruned_loss=0.009776, audio_tagging_loss=0.008142, over 14509.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09088, pruned_loss=0.01255, audio_tagging_loss=0.008673, over 3041504.74 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:39:58,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3519840.0, ans=0.125 2023-11-26 18:40:05,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3519840.0, ans=0.1 2023-11-26 18:40:09,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3519906.6666666665, ans=0.1 2023-11-26 18:40:13,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3519906.6666666665, ans=0.125 2023-11-26 18:40:21,076 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528000 2023-11-26 18:40:37,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3520040.0, ans=0.125 2023-11-26 18:40:38,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3520040.0, ans=0.1 2023-11-26 18:40:54,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3520173.3333333335, ans=0.0 2023-11-26 18:40:54,841 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11000, loss[loss=0.05736, simple_loss=0.07913, pruned_loss=0.009264, audio_tagging_loss=0.008532, over 15209.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09045, pruned_loss=0.01238, audio_tagging_loss=0.008675, over 3034917.65 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:41:06,042 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:41:10,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3520240.0, ans=0.0 2023-11-26 18:41:12,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.767e+01 9.294e+01 1.014e+02 1.282e+02, threshold=1.859e+02, percent-clipped=1.0 2023-11-26 18:41:13,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3520240.0, ans=0.1 2023-11-26 18:41:18,941 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528050 2023-11-26 18:41:24,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3520306.6666666665, ans=0.2 2023-11-26 18:41:41,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3520440.0, ans=0.125 2023-11-26 18:41:50,613 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11050, loss[loss=0.05213, simple_loss=0.07082, pruned_loss=0.006622, audio_tagging_loss=0.0101, over 16497.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09029, pruned_loss=0.01231, audio_tagging_loss=0.008701, over 3043195.80 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:41:52,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2023-11-26 18:41:57,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3520506.6666666665, ans=0.125 2023-11-26 18:42:06,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3520573.3333333335, ans=0.0 2023-11-26 18:42:14,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3520640.0, ans=0.125 2023-11-26 18:42:15,734 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528100 2023-11-26 18:42:19,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3520640.0, ans=0.125 2023-11-26 18:42:22,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3520640.0, ans=0.125 2023-11-26 18:42:27,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3520706.6666666665, ans=0.125 2023-11-26 18:42:30,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3520706.6666666665, ans=0.0 2023-11-26 18:42:46,084 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11100, loss[loss=0.0762, simple_loss=0.1081, pruned_loss=0.0156, audio_tagging_loss=0.006568, over 15695.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.0904, pruned_loss=0.01248, audio_tagging_loss=0.008811, over 3044723.78 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:42:51,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3520840.0, ans=0.125 2023-11-26 18:43:04,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 9.164e+01 1.008e+02 1.089e+02 1.427e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-26 18:43:07,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3520906.6666666665, ans=0.1 2023-11-26 18:43:11,698 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528150 2023-11-26 18:43:41,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3521106.6666666665, ans=0.125 2023-11-26 18:43:43,191 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11150, loss[loss=0.06537, simple_loss=0.08563, pruned_loss=0.01429, audio_tagging_loss=0.008263, over 15403.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09077, pruned_loss=0.0125, audio_tagging_loss=0.00888, over 3047951.06 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:43:57,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3521240.0, ans=0.125 2023-11-26 18:44:05,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3521306.6666666665, ans=15.0 2023-11-26 18:44:07,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528200 2023-11-26 18:44:23,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3521373.3333333335, ans=0.125 2023-11-26 18:44:24,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521373.3333333335, ans=0.1 2023-11-26 18:44:39,045 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11200, loss[loss=0.05862, simple_loss=0.07194, pruned_loss=0.008732, audio_tagging_loss=0.01392, over 16369.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09086, pruned_loss=0.01252, audio_tagging_loss=0.008924, over 3051986.94 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:44:41,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3521506.6666666665, ans=0.0 2023-11-26 18:44:44,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2023-11-26 18:44:50,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521573.3333333335, ans=0.1 2023-11-26 18:44:58,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.872e+01 9.383e+01 1.014e+02 1.200e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 18:45:04,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528250 2023-11-26 18:45:30,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3521773.3333333335, ans=0.0 2023-11-26 18:45:33,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3521840.0, ans=0.125 2023-11-26 18:45:33,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2023-11-26 18:45:34,280 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11250, loss[loss=0.0495, simple_loss=0.06446, pruned_loss=0.008099, audio_tagging_loss=0.009172, over 15298.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08952, pruned_loss=0.01246, audio_tagging_loss=0.009045, over 3040518.09 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:45:36,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3521840.0, ans=0.0 2023-11-26 18:45:45,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3521906.6666666665, ans=0.0 2023-11-26 18:45:57,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3521973.3333333335, ans=0.125 2023-11-26 18:45:59,606 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528300 2023-11-26 18:46:10,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3522040.0, ans=0.5 2023-11-26 18:46:12,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2023-11-26 18:46:14,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3522040.0, ans=0.04949747468305833 2023-11-26 18:46:31,392 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11300, loss[loss=0.06045, simple_loss=0.08506, pruned_loss=0.01105, audio_tagging_loss=0.00687, over 14388.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08959, pruned_loss=0.01256, audio_tagging_loss=0.008934, over 3039417.07 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:46:39,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3522173.3333333335, ans=0.04949747468305833 2023-11-26 18:46:41,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3522240.0, ans=0.0 2023-11-26 18:46:51,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.924e+01 9.340e+01 1.002e+02 1.202e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:46:52,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3522306.6666666665, ans=0.125 2023-11-26 18:46:55,416 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528350 2023-11-26 18:47:20,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3522440.0, ans=0.0 2023-11-26 18:47:26,475 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11350, loss[loss=0.05893, simple_loss=0.0855, pruned_loss=0.007767, audio_tagging_loss=0.008413, over 15792.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08983, pruned_loss=0.0126, audio_tagging_loss=0.008752, over 3046714.83 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:47:27,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3522506.6666666665, ans=0.0 2023-11-26 18:47:33,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3522506.6666666665, ans=0.125 2023-11-26 18:47:41,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-26 18:47:50,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3522640.0, ans=0.0 2023-11-26 18:47:51,559 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528400 2023-11-26 18:47:53,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3522640.0, ans=0.125 2023-11-26 18:47:57,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3522640.0, ans=0.125 2023-11-26 18:48:10,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-26 18:48:22,538 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11400, loss[loss=0.08789, simple_loss=0.1267, pruned_loss=0.01808, audio_tagging_loss=0.00647, over 15353.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09003, pruned_loss=0.01242, audio_tagging_loss=0.008612, over 3045891.08 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:48:24,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2023-11-26 18:48:43,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.848e+01 9.001e+01 9.594e+01 1.048e+02 1.378e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 18:48:46,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3522973.3333333335, ans=0.125 2023-11-26 18:48:47,629 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528450 2023-11-26 18:49:00,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3523040.0, ans=0.1 2023-11-26 18:49:05,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3523040.0, ans=0.0 2023-11-26 18:49:12,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3523106.6666666665, ans=0.0 2023-11-26 18:49:19,512 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11450, loss[loss=0.06222, simple_loss=0.07886, pruned_loss=0.0128, audio_tagging_loss=0.009983, over 14170.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08942, pruned_loss=0.01234, audio_tagging_loss=0.008583, over 3042692.48 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:49:40,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3523306.6666666665, ans=0.1 2023-11-26 18:49:42,822 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528500 2023-11-26 18:49:47,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3523306.6666666665, ans=0.2 2023-11-26 18:50:01,510 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2023-11-26 18:50:03,268 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:50:08,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2023-11-26 18:50:11,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3523440.0, ans=0.0 2023-11-26 18:50:12,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3523440.0, ans=0.125 2023-11-26 18:50:14,552 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11500, loss[loss=0.0748, simple_loss=0.1138, pruned_loss=0.01165, audio_tagging_loss=0.006231, over 15152.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08937, pruned_loss=0.01233, audio_tagging_loss=0.008631, over 3042316.86 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:50:16,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3523506.6666666665, ans=0.125 2023-11-26 18:50:34,141 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.724e+01 9.278e+01 1.017e+02 1.417e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 18:50:36,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3523640.0, ans=0.0 2023-11-26 18:50:39,558 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528550 2023-11-26 18:50:56,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3523706.6666666665, ans=0.1 2023-11-26 18:51:00,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3523773.3333333335, ans=0.0 2023-11-26 18:51:09,833 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11550, loss[loss=0.08597, simple_loss=0.1277, pruned_loss=0.01566, audio_tagging_loss=0.006457, over 15214.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08922, pruned_loss=0.01225, audio_tagging_loss=0.008594, over 3043446.49 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:51:16,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3523840.0, ans=0.015 2023-11-26 18:51:23,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3523906.6666666665, ans=0.125 2023-11-26 18:51:28,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3523906.6666666665, ans=0.1 2023-11-26 18:51:35,393 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528600 2023-11-26 18:51:47,424 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:51:47,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3524040.0, ans=0.125 2023-11-26 18:52:06,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3524173.3333333335, ans=0.0 2023-11-26 18:52:07,220 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11600, loss[loss=0.06617, simple_loss=0.09651, pruned_loss=0.01122, audio_tagging_loss=0.006692, over 15044.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08877, pruned_loss=0.01211, audio_tagging_loss=0.008658, over 3044760.03 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:52:26,811 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.975e+01 9.642e+01 1.029e+02 1.280e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 18:52:28,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-26 18:52:31,128 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528650 2023-11-26 18:52:36,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-26 18:52:56,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3524440.0, ans=0.2 2023-11-26 18:53:02,898 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11650, loss[loss=0.04781, simple_loss=0.06164, pruned_loss=0.004487, audio_tagging_loss=0.0125, over 14724.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08798, pruned_loss=0.01184, audio_tagging_loss=0.008734, over 3051350.25 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:53:07,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3524506.6666666665, ans=0.2 2023-11-26 18:53:20,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3524573.3333333335, ans=0.0 2023-11-26 18:53:27,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528700 2023-11-26 18:53:30,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3524640.0, ans=0.0 2023-11-26 18:53:52,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3524773.3333333335, ans=0.2 2023-11-26 18:53:57,975 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11700, loss[loss=0.07598, simple_loss=0.1007, pruned_loss=0.0144, audio_tagging_loss=0.01122, over 15507.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08755, pruned_loss=0.01186, audio_tagging_loss=0.008806, over 3051739.10 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:54:13,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3524906.6666666665, ans=0.125 2023-11-26 18:54:19,355 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.034e+01 9.030e+01 9.498e+01 1.025e+02 1.677e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 18:54:23,685 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528750 2023-11-26 18:54:55,278 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11750, loss[loss=0.08722, simple_loss=0.1206, pruned_loss=0.019, audio_tagging_loss=0.0079, over 16299.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08796, pruned_loss=0.01187, audio_tagging_loss=0.008813, over 3049493.56 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:55:10,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3525240.0, ans=0.07 2023-11-26 18:55:15,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3525240.0, ans=0.125 2023-11-26 18:55:15,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3525240.0, ans=0.1 2023-11-26 18:55:19,240 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528800 2023-11-26 18:55:32,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2023-11-26 18:55:51,236 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11800, loss[loss=0.07296, simple_loss=0.1065, pruned_loss=0.01199, audio_tagging_loss=0.007733, over 15151.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08801, pruned_loss=0.01201, audio_tagging_loss=0.00883, over 3044472.33 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:55:56,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3525506.6666666665, ans=0.0 2023-11-26 18:56:01,079 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:56:10,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 8.967e+01 9.583e+01 1.033e+02 1.275e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 18:56:14,757 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528850 2023-11-26 18:56:27,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3525706.6666666665, ans=0.125 2023-11-26 18:56:29,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3525706.6666666665, ans=0.125 2023-11-26 18:56:33,441 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.37 vs. limit=10.0 2023-11-26 18:56:34,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.85 vs. limit=10.0 2023-11-26 18:56:37,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3525773.3333333335, ans=0.0 2023-11-26 18:56:40,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3525773.3333333335, ans=0.125 2023-11-26 18:56:46,545 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11850, loss[loss=0.0541, simple_loss=0.06737, pruned_loss=0.009524, audio_tagging_loss=0.01089, over 12951.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08775, pruned_loss=0.01204, audio_tagging_loss=0.008861, over 3038924.63 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:56:54,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3525840.0, ans=0.2 2023-11-26 18:56:55,692 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=10.0 2023-11-26 18:57:03,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=12.0 2023-11-26 18:57:05,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3525906.6666666665, ans=0.125 2023-11-26 18:57:05,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3525906.6666666665, ans=0.0 2023-11-26 18:57:12,175 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528900 2023-11-26 18:57:25,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3526040.0, ans=0.125 2023-11-26 18:57:42,552 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11900, loss[loss=0.06788, simple_loss=0.08701, pruned_loss=0.01629, audio_tagging_loss=0.008085, over 15144.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08913, pruned_loss=0.0123, audio_tagging_loss=0.008863, over 3044726.99 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:58:02,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.907e+01 9.565e+01 1.014e+02 1.302e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 18:58:07,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528950 2023-11-26 18:58:17,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3526373.3333333335, ans=0.125 2023-11-26 18:58:39,292 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11950, loss[loss=0.04833, simple_loss=0.05868, pruned_loss=0.006223, audio_tagging_loss=0.01277, over 14558.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08885, pruned_loss=0.01218, audio_tagging_loss=0.008903, over 3039250.30 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:58:43,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3526506.6666666665, ans=0.1 2023-11-26 18:58:53,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-26 18:58:59,753 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:59:02,675 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529000 2023-11-26 18:59:09,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3526640.0, ans=0.0 2023-11-26 18:59:18,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3526706.6666666665, ans=0.0 2023-11-26 18:59:24,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3526773.3333333335, ans=0.0 2023-11-26 18:59:30,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=22.5 2023-11-26 18:59:34,101 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 12000, loss[loss=0.08599, simple_loss=0.1229, pruned_loss=0.01771, audio_tagging_loss=0.006812, over 14982.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08912, pruned_loss=0.01221, audio_tagging_loss=0.008862, over 3040057.82 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:59:34,101 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 18:59:56,271 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1811, 3.9784, 3.7905, 3.2294], device='cuda:2') 2023-11-26 19:00:06,919 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05801, simple_loss=0.05056, pruned_loss=0.005309, audio_tagging_loss=0.02742, over 4681554.00 frames. 2023-11-26 19:00:06,919 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 19:00:09,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3526840.0, ans=0.125 2023-11-26 19:00:25,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.909e+01 9.466e+01 1.042e+02 1.234e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 19:00:29,516 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529050 2023-11-26 19:00:29,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3526973.3333333335, ans=0.2 2023-11-26 19:01:05,863 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 0, loss[loss=0.06751, simple_loss=0.07492, pruned_loss=0.008768, audio_tagging_loss=0.02128, over 14116.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.07492, pruned_loss=0.008768, audio_tagging_loss=0.02128, over 14116.00 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:01:05,864 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 19:01:37,710 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05755, simple_loss=0.05055, pruned_loss=0.005302, audio_tagging_loss=0.02697, over 4681554.00 frames. 2023-11-26 19:01:37,710 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 19:01:38,111 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-26 19:01:57,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3527080.0, ans=10.0 2023-11-26 19:02:01,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3527146.6666666665, ans=0.125 2023-11-26 19:02:21,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-26 19:02:28,764 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529100 2023-11-26 19:02:32,907 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 50, loss[loss=0.07234, simple_loss=0.09415, pruned_loss=0.01211, audio_tagging_loss=0.01316, over 16878.00 frames. ], tot_loss[loss=0.07572, simple_loss=0.09318, pruned_loss=0.0129, audio_tagging_loss=0.01623, over 687431.95 frames. ], batch size: 64, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:02:36,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3527346.6666666665, ans=0.5 2023-11-26 19:02:39,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3527346.6666666665, ans=0.05 2023-11-26 19:02:44,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3527413.3333333335, ans=0.125 2023-11-26 19:02:44,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3527413.3333333335, ans=0.1 2023-11-26 19:02:54,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2023-11-26 19:03:12,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3527546.6666666665, ans=0.1 2023-11-26 19:03:13,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2023-11-26 19:03:20,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.836e+01 9.859e+01 1.043e+02 1.139e+02 1.375e+02, threshold=2.086e+02, percent-clipped=0.0 2023-11-26 19:03:23,766 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529150 2023-11-26 19:03:28,435 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 100, loss[loss=0.07294, simple_loss=0.09189, pruned_loss=0.01391, audio_tagging_loss=0.01309, over 14245.00 frames. ], tot_loss[loss=0.07515, simple_loss=0.09348, pruned_loss=0.01281, audio_tagging_loss=0.01561, over 1209565.86 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:03:38,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3527746.6666666665, ans=0.0 2023-11-26 19:03:47,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:47,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:49,653 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:03:59,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3527813.3333333335, ans=0.0 2023-11-26 19:04:01,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3527880.0, ans=0.0 2023-11-26 19:04:11,971 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2023-11-26 19:04:12,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3527946.6666666665, ans=0.125 2023-11-26 19:04:19,247 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529200 2023-11-26 19:04:23,740 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 150, loss[loss=0.0574, simple_loss=0.07679, pruned_loss=0.007838, audio_tagging_loss=0.01117, over 15823.00 frames. ], tot_loss[loss=0.07191, simple_loss=0.09139, pruned_loss=0.0121, audio_tagging_loss=0.01411, over 1626705.83 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:04:26,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3528013.3333333335, ans=0.5 2023-11-26 19:04:43,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3528080.0, ans=0.125 2023-11-26 19:04:44,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-26 19:04:50,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-26 19:04:51,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-26 19:04:51,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3528146.6666666665, ans=0.07 2023-11-26 19:04:54,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2023-11-26 19:04:58,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3528213.3333333335, ans=0.2 2023-11-26 19:04:59,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3528213.3333333335, ans=0.0 2023-11-26 19:05:11,742 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 9.212e+01 9.845e+01 1.053e+02 1.367e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-26 19:05:14,990 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529250 2023-11-26 19:05:19,229 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 200, loss[loss=0.05203, simple_loss=0.07091, pruned_loss=0.008544, audio_tagging_loss=0.008032, over 15267.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09121, pruned_loss=0.01199, audio_tagging_loss=0.01261, over 1942902.34 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:05:29,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3528413.3333333335, ans=0.0 2023-11-26 19:05:34,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3528413.3333333335, ans=0.125 2023-11-26 19:05:39,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3528413.3333333335, ans=0.0 2023-11-26 19:05:50,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3528480.0, ans=0.125 2023-11-26 19:05:56,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3528546.6666666665, ans=0.125 2023-11-26 19:06:01,328 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:06:07,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3528613.3333333335, ans=0.125 2023-11-26 19:06:08,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3528613.3333333335, ans=0.125 2023-11-26 19:06:09,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529300 2023-11-26 19:06:13,961 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 250, loss[loss=0.05719, simple_loss=0.07693, pruned_loss=0.01007, audio_tagging_loss=0.008651, over 15704.00 frames. ], tot_loss[loss=0.06856, simple_loss=0.09049, pruned_loss=0.01187, audio_tagging_loss=0.01144, over 2180214.17 frames. ], batch size: 64, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:06:15,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3528680.0, ans=0.1 2023-11-26 19:06:22,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3528680.0, ans=0.125 2023-11-26 19:06:24,152 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-11-26 19:06:28,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-11-26 19:06:28,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-26 19:06:46,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3528880.0, ans=0.1 2023-11-26 19:06:49,064 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:07:00,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-26 19:07:01,943 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 9.038e+01 9.703e+01 1.049e+02 1.454e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-26 19:07:05,754 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529350 2023-11-26 19:07:09,936 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 300, loss[loss=0.07308, simple_loss=0.09327, pruned_loss=0.02053, audio_tagging_loss=0.005909, over 15806.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.08948, pruned_loss=0.01204, audio_tagging_loss=0.01068, over 2372943.95 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:07:17,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3529013.3333333335, ans=0.125 2023-11-26 19:07:18,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3529013.3333333335, ans=0.125 2023-11-26 19:07:18,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-26 19:07:22,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3529080.0, ans=0.125 2023-11-26 19:07:54,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3529280.0, ans=0.0 2023-11-26 19:08:00,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529400 2023-11-26 19:08:05,808 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 350, loss[loss=0.07282, simple_loss=0.1021, pruned_loss=0.01301, audio_tagging_loss=0.00874, over 15369.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.08992, pruned_loss=0.01209, audio_tagging_loss=0.01006, over 2526072.90 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:08:13,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3529346.6666666665, ans=0.125 2023-11-26 19:08:15,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3529413.3333333335, ans=0.2 2023-11-26 19:08:19,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-26 19:08:24,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-26 19:08:30,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3529480.0, ans=0.1 2023-11-26 19:08:30,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3529480.0, ans=0.1 2023-11-26 19:08:37,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2023-11-26 19:08:53,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.894e+01 9.566e+01 1.035e+02 1.216e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 19:08:56,493 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529450 2023-11-26 19:09:00,674 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 400, loss[loss=0.06451, simple_loss=0.09007, pruned_loss=0.0116, audio_tagging_loss=0.007875, over 15979.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.08998, pruned_loss=0.01225, audio_tagging_loss=0.009734, over 2639781.31 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:09:05,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3529680.0, ans=0.125 2023-11-26 19:09:19,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-26 19:09:33,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3529880.0, ans=0.0 2023-11-26 19:09:52,818 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529500 2023-11-26 19:09:54,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3529946.6666666665, ans=0.125 2023-11-26 19:09:57,579 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 450, loss[loss=0.04853, simple_loss=0.06093, pruned_loss=0.006883, audio_tagging_loss=0.01118, over 15937.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08907, pruned_loss=0.01214, audio_tagging_loss=0.009587, over 2730558.32 frames. ], batch size: 63, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:10:15,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3530080.0, ans=0.125 2023-11-26 19:10:23,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3530146.6666666665, ans=0.015 2023-11-26 19:10:26,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3530146.6666666665, ans=0.125 2023-11-26 19:10:40,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2023-11-26 19:10:44,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2023-11-26 19:10:46,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.546e+01 9.095e+01 1.009e+02 1.358e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-26 19:10:48,785 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529550 2023-11-26 19:10:53,023 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 500, loss[loss=0.05725, simple_loss=0.07159, pruned_loss=0.008769, audio_tagging_loss=0.01269, over 15910.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08818, pruned_loss=0.01185, audio_tagging_loss=0.009332, over 2800156.26 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:10:59,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3530346.6666666665, ans=0.0 2023-11-26 19:11:34,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3530546.6666666665, ans=0.125 2023-11-26 19:11:42,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3530613.3333333335, ans=0.2 2023-11-26 19:11:44,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529600 2023-11-26 19:11:49,416 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 550, loss[loss=0.05872, simple_loss=0.08234, pruned_loss=0.009285, audio_tagging_loss=0.008268, over 14158.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08871, pruned_loss=0.01194, audio_tagging_loss=0.009179, over 2854038.89 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:11:58,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3530680.0, ans=0.1 2023-11-26 19:12:05,624 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-11-26 19:12:12,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-26 19:12:12,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3530813.3333333335, ans=0.0 2023-11-26 19:12:30,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3530880.0, ans=0.0 2023-11-26 19:12:35,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3530946.6666666665, ans=0.2 2023-11-26 19:12:39,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 8.958e+01 9.518e+01 1.034e+02 1.414e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 19:12:41,719 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529650 2023-11-26 19:12:46,994 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 600, loss[loss=0.06873, simple_loss=0.09193, pruned_loss=0.01442, audio_tagging_loss=0.008352, over 15601.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08839, pruned_loss=0.01199, audio_tagging_loss=0.00918, over 2897280.21 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:13:08,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3531146.6666666665, ans=0.2 2023-11-26 19:13:10,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-26 19:13:15,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-26 19:13:16,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3531146.6666666665, ans=0.05 2023-11-26 19:13:23,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3531213.3333333335, ans=0.1 2023-11-26 19:13:29,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3531213.3333333335, ans=0.2 2023-11-26 19:13:33,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3531280.0, ans=0.035 2023-11-26 19:13:37,421 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529700 2023-11-26 19:13:41,614 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 650, loss[loss=0.07642, simple_loss=0.1018, pruned_loss=0.01648, audio_tagging_loss=0.009019, over 15880.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08966, pruned_loss=0.0122, audio_tagging_loss=0.009008, over 2933439.64 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:13:43,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-26 19:13:48,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3531346.6666666665, ans=0.1 2023-11-26 19:13:58,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-26 19:14:15,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=8.0 2023-11-26 19:14:20,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-11-26 19:14:26,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3531613.3333333335, ans=0.125 2023-11-26 19:14:29,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 9.030e+01 9.559e+01 1.040e+02 1.405e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 19:14:32,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529750 2023-11-26 19:14:33,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3531613.3333333335, ans=0.125 2023-11-26 19:14:36,331 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 700, loss[loss=0.0566, simple_loss=0.08001, pruned_loss=0.007939, audio_tagging_loss=0.008651, over 15621.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08988, pruned_loss=0.01227, audio_tagging_loss=0.008918, over 2963222.70 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:14:39,584 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.20 vs. limit=10.0 2023-11-26 19:14:41,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3531680.0, ans=0.0 2023-11-26 19:14:45,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3531680.0, ans=0.2 2023-11-26 19:14:47,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3531746.6666666665, ans=0.2 2023-11-26 19:14:47,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2023-11-26 19:14:55,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3531746.6666666665, ans=0.0 2023-11-26 19:15:00,567 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:15:12,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3531880.0, ans=0.125 2023-11-26 19:15:19,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3531946.6666666665, ans=0.125 2023-11-26 19:15:22,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2023-11-26 19:15:27,960 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529800 2023-11-26 19:15:32,426 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 750, loss[loss=0.09124, simple_loss=0.1226, pruned_loss=0.02284, audio_tagging_loss=0.007079, over 14548.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09027, pruned_loss=0.01221, audio_tagging_loss=0.008878, over 2980735.24 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:16:00,205 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.01 vs. limit=22.5 2023-11-26 19:16:02,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-11-26 19:16:06,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3532213.3333333335, ans=0.125 2023-11-26 19:16:09,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3532213.3333333335, ans=0.125 2023-11-26 19:16:22,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.840e+01 9.480e+01 1.009e+02 1.765e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 19:16:23,753 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529850 2023-11-26 19:16:27,891 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 800, loss[loss=0.06857, simple_loss=0.08981, pruned_loss=0.01319, audio_tagging_loss=0.01047, over 14866.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08994, pruned_loss=0.01224, audio_tagging_loss=0.009029, over 3000357.84 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:16:37,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3532413.3333333335, ans=0.125 2023-11-26 19:16:38,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2023-11-26 19:16:46,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3532413.3333333335, ans=0.0 2023-11-26 19:17:06,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3532546.6666666665, ans=0.2 2023-11-26 19:17:10,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532546.6666666665, ans=0.1 2023-11-26 19:17:12,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2023-11-26 19:17:14,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-26 19:17:14,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-26 19:17:15,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3532613.3333333335, ans=0.2 2023-11-26 19:17:18,526 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529900 2023-11-26 19:17:22,615 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 850, loss[loss=0.08069, simple_loss=0.1124, pruned_loss=0.01398, audio_tagging_loss=0.0105, over 16063.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08883, pruned_loss=0.01203, audio_tagging_loss=0.009021, over 3011491.49 frames. ], batch size: 62, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:17:31,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3532680.0, ans=0.125 2023-11-26 19:17:31,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3532680.0, ans=0.125 2023-11-26 19:17:38,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3532746.6666666665, ans=0.1 2023-11-26 19:17:51,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3532813.3333333335, ans=0.07 2023-11-26 19:18:01,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2023-11-26 19:18:11,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 9.004e+01 9.716e+01 1.045e+02 1.656e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 19:18:13,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529950 2023-11-26 19:18:17,650 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:18:18,469 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 900, loss[loss=0.06338, simple_loss=0.08695, pruned_loss=0.01116, audio_tagging_loss=0.00874, over 14596.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08835, pruned_loss=0.01203, audio_tagging_loss=0.009164, over 3009320.12 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:18:48,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3533146.6666666665, ans=0.0 2023-11-26 19:19:04,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3533280.0, ans=0.04949747468305833 2023-11-26 19:19:09,845 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530000 2023-11-26 19:19:14,245 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 950, loss[loss=0.0485, simple_loss=0.06951, pruned_loss=0.006578, audio_tagging_loss=0.007167, over 14004.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08835, pruned_loss=0.0119, audio_tagging_loss=0.00916, over 3019431.61 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:19:24,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-26 19:19:25,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2023-11-26 19:19:27,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-26 19:19:27,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3533413.3333333335, ans=0.0 2023-11-26 19:19:28,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-26 19:19:40,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3533480.0, ans=0.125 2023-11-26 19:19:46,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3533546.6666666665, ans=0.125 2023-11-26 19:19:46,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3533546.6666666665, ans=0.125 2023-11-26 19:20:03,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.669e+01 9.436e+01 1.000e+02 1.329e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 19:20:04,918 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530050 2023-11-26 19:20:09,186 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1000, loss[loss=0.07051, simple_loss=0.1, pruned_loss=0.01254, audio_tagging_loss=0.007948, over 16336.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.0885, pruned_loss=0.01191, audio_tagging_loss=0.008972, over 3025509.94 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:20:33,244 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:20:37,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3533813.3333333335, ans=0.125 2023-11-26 19:20:43,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3533880.0, ans=0.125 2023-11-26 19:21:00,092 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530100 2023-11-26 19:21:04,780 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1050, loss[loss=0.0523, simple_loss=0.07586, pruned_loss=0.008356, audio_tagging_loss=0.00601, over 15087.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08781, pruned_loss=0.01185, audio_tagging_loss=0.008856, over 3034361.67 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:21:15,336 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:21:22,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3534080.0, ans=0.125 2023-11-26 19:21:43,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3534213.3333333335, ans=0.125 2023-11-26 19:21:52,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3534280.0, ans=0.0 2023-11-26 19:21:54,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.897e+01 9.575e+01 1.026e+02 1.368e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 19:21:55,824 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530150 2023-11-26 19:21:59,997 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1100, loss[loss=0.09184, simple_loss=0.1278, pruned_loss=0.02138, audio_tagging_loss=0.006588, over 15169.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08804, pruned_loss=0.01182, audio_tagging_loss=0.008818, over 3042683.84 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:22:01,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-26 19:22:02,113 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:22:02,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-26 19:22:02,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3534346.6666666665, ans=0.1 2023-11-26 19:22:09,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3534413.3333333335, ans=0.1 2023-11-26 19:22:30,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3534480.0, ans=0.0 2023-11-26 19:22:34,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3534546.6666666665, ans=0.125 2023-11-26 19:22:50,919 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530200 2023-11-26 19:22:55,384 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1150, loss[loss=0.07587, simple_loss=0.1066, pruned_loss=0.01675, audio_tagging_loss=0.005819, over 15303.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08813, pruned_loss=0.01182, audio_tagging_loss=0.008803, over 3042487.07 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:22:59,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3534680.0, ans=0.125 2023-11-26 19:23:05,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3534746.6666666665, ans=0.125 2023-11-26 19:23:07,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3534746.6666666665, ans=0.125 2023-11-26 19:23:11,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3534746.6666666665, ans=0.09899494936611666 2023-11-26 19:23:21,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3534813.3333333335, ans=0.125 2023-11-26 19:23:22,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3534813.3333333335, ans=0.125 2023-11-26 19:23:24,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3534813.3333333335, ans=0.0 2023-11-26 19:23:44,883 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.885e+01 9.403e+01 1.020e+02 1.405e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 19:23:46,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530250 2023-11-26 19:23:50,184 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1200, loss[loss=0.04588, simple_loss=0.06187, pruned_loss=0.005779, audio_tagging_loss=0.009163, over 14690.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08932, pruned_loss=0.01192, audio_tagging_loss=0.008704, over 3047597.47 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:23:53,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3535013.3333333335, ans=0.0 2023-11-26 19:23:57,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-26 19:24:07,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-26 19:24:43,537 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530300 2023-11-26 19:24:47,719 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1250, loss[loss=0.07362, simple_loss=0.1031, pruned_loss=0.01485, audio_tagging_loss=0.007225, over 15648.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08972, pruned_loss=0.01208, audio_tagging_loss=0.008566, over 3045767.19 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:25:20,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3535546.6666666665, ans=0.0 2023-11-26 19:25:30,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3535613.3333333335, ans=0.2 2023-11-26 19:25:34,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3535613.3333333335, ans=0.0 2023-11-26 19:25:38,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.830e+01 9.575e+01 1.052e+02 2.949e+02, threshold=1.915e+02, percent-clipped=1.0 2023-11-26 19:25:38,922 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530350 2023-11-26 19:25:43,058 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1300, loss[loss=0.04947, simple_loss=0.06158, pruned_loss=0.00954, audio_tagging_loss=0.009139, over 15259.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.0904, pruned_loss=0.01223, audio_tagging_loss=0.008532, over 3041931.60 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:25:54,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=12.0 2023-11-26 19:26:05,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3535813.3333333335, ans=0.0 2023-11-26 19:26:06,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-11-26 19:26:21,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3535880.0, ans=0.2 2023-11-26 19:26:27,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-11-26 19:26:27,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-26 19:26:33,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530400 2023-11-26 19:26:38,077 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1350, loss[loss=0.0645, simple_loss=0.09186, pruned_loss=0.01114, audio_tagging_loss=0.007432, over 15840.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09003, pruned_loss=0.01212, audio_tagging_loss=0.008583, over 3040025.23 frames. ], batch size: 62, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:26:48,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3536013.3333333335, ans=0.2 2023-11-26 19:27:18,163 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:27:30,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.761e+01 9.344e+01 1.009e+02 1.308e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 19:27:30,643 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530450 2023-11-26 19:27:34,962 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1400, loss[loss=0.07772, simple_loss=0.1059, pruned_loss=0.01661, audio_tagging_loss=0.008151, over 15495.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08974, pruned_loss=0.01213, audio_tagging_loss=0.008662, over 3036821.37 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:28:15,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-26 19:28:18,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3536613.3333333335, ans=0.05 2023-11-26 19:28:20,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3536613.3333333335, ans=0.0 2023-11-26 19:28:26,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530500 2023-11-26 19:28:26,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3536613.3333333335, ans=0.125 2023-11-26 19:28:30,898 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1450, loss[loss=0.05279, simple_loss=0.06947, pruned_loss=0.009141, audio_tagging_loss=0.008917, over 14405.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09014, pruned_loss=0.01218, audio_tagging_loss=0.008661, over 3040671.26 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:28:34,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3536680.0, ans=0.0 2023-11-26 19:28:35,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3536680.0, ans=0.1 2023-11-26 19:28:40,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2023-11-26 19:29:22,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.955e+01 9.646e+01 1.028e+02 1.353e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 19:29:22,244 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530550 2023-11-26 19:29:26,584 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1500, loss[loss=0.05706, simple_loss=0.07765, pruned_loss=0.00941, audio_tagging_loss=0.008827, over 14729.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.0894, pruned_loss=0.01225, audio_tagging_loss=0.008657, over 3041620.68 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:29:36,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-11-26 19:29:40,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-26 19:29:43,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3537080.0, ans=15.0 2023-11-26 19:29:47,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3537080.0, ans=0.05 2023-11-26 19:29:58,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3537146.6666666665, ans=0.0 2023-11-26 19:30:03,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3537213.3333333335, ans=0.035 2023-11-26 19:30:05,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2023-11-26 19:30:18,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530600 2023-11-26 19:30:23,553 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1550, loss[loss=0.05591, simple_loss=0.07392, pruned_loss=0.0109, audio_tagging_loss=0.008059, over 14881.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.0899, pruned_loss=0.01238, audio_tagging_loss=0.008735, over 3038385.53 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:30:35,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3537413.3333333335, ans=0.125 2023-11-26 19:31:03,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3537546.6666666665, ans=0.125 2023-11-26 19:31:09,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3537613.3333333335, ans=0.0 2023-11-26 19:31:14,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 9.127e+01 9.575e+01 1.042e+02 1.304e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 19:31:14,345 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530650 2023-11-26 19:31:15,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3537613.3333333335, ans=0.125 2023-11-26 19:31:18,492 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1600, loss[loss=0.05871, simple_loss=0.07525, pruned_loss=0.009687, audio_tagging_loss=0.0114, over 14768.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08972, pruned_loss=0.01243, audio_tagging_loss=0.00882, over 3037397.83 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:31:24,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3537680.0, ans=0.125 2023-11-26 19:31:53,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3537880.0, ans=0.125 2023-11-26 19:32:09,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3537946.6666666665, ans=0.125 2023-11-26 19:32:10,456 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530700 2023-11-26 19:32:14,670 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1650, loss[loss=0.06245, simple_loss=0.08259, pruned_loss=0.01329, audio_tagging_loss=0.007867, over 14507.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08993, pruned_loss=0.0123, audio_tagging_loss=0.008762, over 3034763.62 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:32:31,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3538080.0, ans=0.1 2023-11-26 19:33:06,462 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530750 2023-11-26 19:33:08,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.661e+01 9.270e+01 1.008e+02 1.328e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 19:33:11,682 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1700, loss[loss=0.06866, simple_loss=0.08851, pruned_loss=0.01436, audio_tagging_loss=0.01005, over 14908.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08951, pruned_loss=0.01207, audio_tagging_loss=0.008861, over 3042870.42 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:33:13,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2023-11-26 19:33:26,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3538413.3333333335, ans=0.0 2023-11-26 19:33:34,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3538480.0, ans=0.1 2023-11-26 19:33:37,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3538480.0, ans=0.1 2023-11-26 19:33:42,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3538480.0, ans=0.125 2023-11-26 19:34:03,010 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530800 2023-11-26 19:34:07,595 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1750, loss[loss=0.06158, simple_loss=0.08589, pruned_loss=0.009383, audio_tagging_loss=0.009249, over 15005.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.0897, pruned_loss=0.01209, audio_tagging_loss=0.008769, over 3047206.13 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:34:07,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3538680.0, ans=0.125 2023-11-26 19:34:23,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3538746.6666666665, ans=0.125 2023-11-26 19:34:24,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3538746.6666666665, ans=0.0 2023-11-26 19:34:25,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3538746.6666666665, ans=0.2 2023-11-26 19:34:36,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538813.3333333335, ans=0.1 2023-11-26 19:34:40,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=12.0 2023-11-26 19:34:41,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3538880.0, ans=0.2 2023-11-26 19:34:46,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3538880.0, ans=0.04949747468305833 2023-11-26 19:34:58,823 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530850 2023-11-26 19:35:00,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 9.072e+01 9.578e+01 1.039e+02 1.393e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 19:35:03,576 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1800, loss[loss=0.07834, simple_loss=0.1123, pruned_loss=0.01504, audio_tagging_loss=0.00714, over 14671.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09033, pruned_loss=0.01206, audio_tagging_loss=0.00865, over 3050536.35 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:35:07,275 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2023-11-26 19:35:23,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3539080.0, ans=0.2 2023-11-26 19:35:25,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3539146.6666666665, ans=0.125 2023-11-26 19:35:31,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3539146.6666666665, ans=0.125 2023-11-26 19:35:39,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2023-11-26 19:35:48,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3539280.0, ans=0.125 2023-11-26 19:35:55,077 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530900 2023-11-26 19:35:59,875 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1850, loss[loss=0.06063, simple_loss=0.07748, pruned_loss=0.01088, audio_tagging_loss=0.01101, over 14976.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08863, pruned_loss=0.01193, audio_tagging_loss=0.008726, over 3044104.18 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:36:21,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=22.5 2023-11-26 19:36:27,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3539480.0, ans=0.125 2023-11-26 19:36:27,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3539480.0, ans=0.2 2023-11-26 19:36:28,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3539480.0, ans=0.0 2023-11-26 19:36:35,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3539546.6666666665, ans=0.125 2023-11-26 19:36:47,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3539613.3333333335, ans=0.125 2023-11-26 19:36:51,411 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530950 2023-11-26 19:36:52,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3539613.3333333335, ans=0.125 2023-11-26 19:36:53,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.894e+01 9.427e+01 1.010e+02 7.555e+02, threshold=1.885e+02, percent-clipped=1.0 2023-11-26 19:36:55,630 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1900, loss[loss=0.05718, simple_loss=0.07852, pruned_loss=0.00814, audio_tagging_loss=0.00978, over 15843.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08844, pruned_loss=0.01197, audio_tagging_loss=0.008659, over 3048973.68 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:37:17,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3539813.3333333335, ans=0.0 2023-11-26 19:37:23,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3539813.3333333335, ans=0.1 2023-11-26 19:37:34,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3539880.0, ans=10.0 2023-11-26 19:37:46,347 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531000 2023-11-26 19:37:49,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3540013.3333333335, ans=0.125 2023-11-26 19:37:50,786 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1950, loss[loss=0.06546, simple_loss=0.08807, pruned_loss=0.01189, audio_tagging_loss=0.00954, over 14396.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08859, pruned_loss=0.01199, audio_tagging_loss=0.008622, over 3044669.11 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:37:52,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3540013.3333333335, ans=0.2 2023-11-26 19:38:02,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3540080.0, ans=0.125 2023-11-26 19:38:27,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3540213.3333333335, ans=0.025 2023-11-26 19:38:35,498 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=22.5 2023-11-26 19:38:35,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2023-11-26 19:38:42,142 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531050 2023-11-26 19:38:44,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.819e+01 9.302e+01 9.928e+01 1.179e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 19:38:46,886 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2000, loss[loss=0.09008, simple_loss=0.1166, pruned_loss=0.02, audio_tagging_loss=0.01177, over 16082.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08906, pruned_loss=0.01207, audio_tagging_loss=0.008613, over 3041458.45 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:38:48,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-26 19:39:04,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3540413.3333333335, ans=0.125 2023-11-26 19:39:09,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3540480.0, ans=0.125 2023-11-26 19:39:19,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3540546.6666666665, ans=0.0 2023-11-26 19:39:35,692 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2023-11-26 19:39:38,331 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531100 2023-11-26 19:39:41,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3540680.0, ans=0.125 2023-11-26 19:39:42,526 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2050, loss[loss=0.07518, simple_loss=0.1096, pruned_loss=0.01296, audio_tagging_loss=0.007418, over 15406.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08905, pruned_loss=0.01206, audio_tagging_loss=0.008621, over 3039414.77 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:39:56,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3540746.6666666665, ans=0.125 2023-11-26 19:40:01,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2023-11-26 19:40:11,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3540813.3333333335, ans=0.125 2023-11-26 19:40:15,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3540880.0, ans=0.1 2023-11-26 19:40:17,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3540880.0, ans=0.125 2023-11-26 19:40:33,205 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531150 2023-11-26 19:40:35,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.849e+01 9.497e+01 1.034e+02 1.158e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 19:40:37,450 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2100, loss[loss=0.05101, simple_loss=0.07148, pruned_loss=0.005715, audio_tagging_loss=0.009558, over 14506.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08974, pruned_loss=0.01223, audio_tagging_loss=0.008546, over 3048392.22 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:40:43,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3541013.3333333335, ans=0.05 2023-11-26 19:40:43,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3541013.3333333335, ans=0.2 2023-11-26 19:40:58,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3541080.0, ans=0.125 2023-11-26 19:41:09,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3541146.6666666665, ans=0.125 2023-11-26 19:41:10,170 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-11-26 19:41:10,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3541213.3333333335, ans=0.125 2023-11-26 19:41:15,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3541213.3333333335, ans=0.125 2023-11-26 19:41:28,839 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531200 2023-11-26 19:41:29,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541280.0, ans=0.1 2023-11-26 19:41:33,343 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2150, loss[loss=0.06503, simple_loss=0.08593, pruned_loss=0.01299, audio_tagging_loss=0.009069, over 14863.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08948, pruned_loss=0.01231, audio_tagging_loss=0.008561, over 3038166.47 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:41:47,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-26 19:41:48,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541413.3333333335, ans=0.1 2023-11-26 19:41:53,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-26 19:42:03,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3541480.0, ans=0.125 2023-11-26 19:42:04,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3541480.0, ans=0.125 2023-11-26 19:42:06,944 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:42:13,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3541546.6666666665, ans=0.025 2023-11-26 19:42:19,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3541613.3333333335, ans=0.125 2023-11-26 19:42:25,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541613.3333333335, ans=0.1 2023-11-26 19:42:26,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531250 2023-11-26 19:42:28,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.897e+01 9.501e+01 1.024e+02 1.712e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 19:42:28,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-26 19:42:29,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=22.5 2023-11-26 19:42:30,395 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2200, loss[loss=0.07945, simple_loss=0.1185, pruned_loss=0.01511, audio_tagging_loss=0.005095, over 15234.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09017, pruned_loss=0.0125, audio_tagging_loss=0.008645, over 3039777.58 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:42:31,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3541680.0, ans=0.2 2023-11-26 19:43:09,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3541880.0, ans=0.0 2023-11-26 19:43:15,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3541946.6666666665, ans=0.2 2023-11-26 19:43:21,590 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531300 2023-11-26 19:43:24,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3542013.3333333335, ans=0.1 2023-11-26 19:43:25,855 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2250, loss[loss=0.0786, simple_loss=0.106, pruned_loss=0.01944, audio_tagging_loss=0.006155, over 15100.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09055, pruned_loss=0.01268, audio_tagging_loss=0.008686, over 3034339.56 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:43:36,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3542080.0, ans=0.125 2023-11-26 19:44:04,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2023-11-26 19:44:04,699 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:44:13,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3542280.0, ans=0.0 2023-11-26 19:44:17,328 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531350 2023-11-26 19:44:19,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.901e+01 8.880e+01 9.630e+01 1.027e+02 2.263e+02, threshold=1.926e+02, percent-clipped=1.0 2023-11-26 19:44:21,513 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2300, loss[loss=0.1156, simple_loss=0.1661, pruned_loss=0.02505, audio_tagging_loss=0.007521, over 15944.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09108, pruned_loss=0.01266, audio_tagging_loss=0.008654, over 3039644.24 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:44:35,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3542413.3333333335, ans=0.0 2023-11-26 19:44:37,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3542413.3333333335, ans=0.125 2023-11-26 19:44:40,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3542413.3333333335, ans=0.0 2023-11-26 19:44:43,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3542413.3333333335, ans=0.125 2023-11-26 19:44:51,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3542480.0, ans=0.0 2023-11-26 19:45:00,417 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-26 19:45:00,748 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=12.0 2023-11-26 19:45:11,712 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:45:14,408 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531400 2023-11-26 19:45:16,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2023-11-26 19:45:18,977 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2350, loss[loss=0.05277, simple_loss=0.07245, pruned_loss=0.007252, audio_tagging_loss=0.009293, over 14943.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09099, pruned_loss=0.01258, audio_tagging_loss=0.008791, over 3043525.80 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:45:30,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3542746.6666666665, ans=0.2 2023-11-26 19:45:52,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3542880.0, ans=0.0 2023-11-26 19:45:57,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-11-26 19:46:08,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3542946.6666666665, ans=0.2 2023-11-26 19:46:10,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2023-11-26 19:46:10,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531450 2023-11-26 19:46:12,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.878e+01 9.481e+01 9.940e+01 1.139e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 19:46:13,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3542946.6666666665, ans=0.0 2023-11-26 19:46:14,882 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2400, loss[loss=0.06662, simple_loss=0.09445, pruned_loss=0.0113, audio_tagging_loss=0.008099, over 14428.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09049, pruned_loss=0.01242, audio_tagging_loss=0.008904, over 3041003.40 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:46:16,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543013.3333333335, ans=0.1 2023-11-26 19:46:19,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=22.5 2023-11-26 19:46:42,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-11-26 19:47:04,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3543280.0, ans=0.0 2023-11-26 19:47:05,852 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531500 2023-11-26 19:47:10,007 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2450, loss[loss=0.04868, simple_loss=0.05955, pruned_loss=0.006619, audio_tagging_loss=0.01229, over 14085.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09024, pruned_loss=0.01233, audio_tagging_loss=0.008898, over 3041029.66 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:47:17,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3543346.6666666665, ans=0.5 2023-11-26 19:47:28,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3543413.3333333335, ans=0.125 2023-11-26 19:47:38,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3543480.0, ans=0.125 2023-11-26 19:47:58,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3543613.3333333335, ans=0.125 2023-11-26 19:48:02,276 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531550 2023-11-26 19:48:05,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.711e+01 9.383e+01 9.958e+01 1.270e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 19:48:07,071 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2500, loss[loss=0.06786, simple_loss=0.07612, pruned_loss=0.01548, audio_tagging_loss=0.01432, over 15206.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08892, pruned_loss=0.01204, audio_tagging_loss=0.009029, over 3038545.93 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:48:29,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3543813.3333333335, ans=0.0 2023-11-26 19:48:42,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3543880.0, ans=0.125 2023-11-26 19:48:53,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3543946.6666666665, ans=0.125 2023-11-26 19:48:53,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=12.0 2023-11-26 19:48:58,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531600 2023-11-26 19:48:58,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3543946.6666666665, ans=0.125 2023-11-26 19:49:03,329 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2550, loss[loss=0.04268, simple_loss=0.05611, pruned_loss=0.005805, audio_tagging_loss=0.008816, over 13991.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08876, pruned_loss=0.01196, audio_tagging_loss=0.008952, over 3032946.46 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:49:04,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3544013.3333333335, ans=0.125 2023-11-26 19:49:09,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3544013.3333333335, ans=6.0 2023-11-26 19:49:19,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3544080.0, ans=0.125 2023-11-26 19:49:39,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2023-11-26 19:49:44,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3544213.3333333335, ans=0.125 2023-11-26 19:49:49,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3544280.0, ans=0.5 2023-11-26 19:49:53,874 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2023-11-26 19:49:54,491 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531650 2023-11-26 19:49:57,592 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.904e+01 9.624e+01 1.038e+02 1.426e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 19:49:58,728 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2600, loss[loss=0.043, simple_loss=0.06469, pruned_loss=0.004419, audio_tagging_loss=0.00624, over 15083.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08814, pruned_loss=0.0119, audio_tagging_loss=0.008783, over 3031318.92 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:50:15,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3544413.3333333335, ans=0.125 2023-11-26 19:50:16,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3544413.3333333335, ans=0.125 2023-11-26 19:50:23,141 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:50:26,338 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:50:50,880 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531700 2023-11-26 19:50:56,191 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2650, loss[loss=0.06127, simple_loss=0.08112, pruned_loss=0.01151, audio_tagging_loss=0.009197, over 14857.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08862, pruned_loss=0.01192, audio_tagging_loss=0.008642, over 3035100.39 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:51:15,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-26 19:51:17,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3544813.3333333335, ans=0.125 2023-11-26 19:51:29,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3544880.0, ans=0.07 2023-11-26 19:51:47,371 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531750 2023-11-26 19:51:50,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.824e+01 9.475e+01 1.010e+02 1.281e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 19:51:51,587 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2700, loss[loss=0.07123, simple_loss=0.09233, pruned_loss=0.01614, audio_tagging_loss=0.008928, over 15178.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.0878, pruned_loss=0.01195, audio_tagging_loss=0.00869, over 3043174.00 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:52:14,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3545146.6666666665, ans=10.0 2023-11-26 19:52:21,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3545146.6666666665, ans=0.05 2023-11-26 19:52:22,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3545146.6666666665, ans=0.125 2023-11-26 19:52:31,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3545213.3333333335, ans=10.0 2023-11-26 19:52:36,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3545280.0, ans=0.09899494936611666 2023-11-26 19:52:41,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3545280.0, ans=0.0 2023-11-26 19:52:42,949 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531800 2023-11-26 19:52:47,356 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2750, loss[loss=0.08108, simple_loss=0.1093, pruned_loss=0.01805, audio_tagging_loss=0.008362, over 16189.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08807, pruned_loss=0.01204, audio_tagging_loss=0.008628, over 3048018.21 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:53:14,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3545480.0, ans=0.125 2023-11-26 19:53:16,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3545480.0, ans=0.2 2023-11-26 19:53:23,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-26 19:53:34,606 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:53:37,817 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531850 2023-11-26 19:53:42,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 8.910e+01 9.547e+01 1.021e+02 1.473e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 19:53:43,105 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2800, loss[loss=0.06093, simple_loss=0.08963, pruned_loss=0.009322, audio_tagging_loss=0.006791, over 16308.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08874, pruned_loss=0.01215, audio_tagging_loss=0.008542, over 3045491.22 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 19:53:46,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3545680.0, ans=0.125 2023-11-26 19:54:09,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3545813.3333333335, ans=0.2 2023-11-26 19:54:14,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3545813.3333333335, ans=0.125 2023-11-26 19:54:20,957 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=22.5 2023-11-26 19:54:31,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3545946.6666666665, ans=0.0 2023-11-26 19:54:34,808 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531900 2023-11-26 19:54:34,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3545946.6666666665, ans=0.0 2023-11-26 19:54:38,991 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2850, loss[loss=0.07102, simple_loss=0.09796, pruned_loss=0.01355, audio_tagging_loss=0.008488, over 16207.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08879, pruned_loss=0.01209, audio_tagging_loss=0.008557, over 3044641.66 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 19:55:08,620 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:55:10,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3546146.6666666665, ans=0.125 2023-11-26 19:55:16,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3546213.3333333335, ans=0.125 2023-11-26 19:55:20,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3546213.3333333335, ans=0.125 2023-11-26 19:55:30,551 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531950 2023-11-26 19:55:34,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.835e+01 9.315e+01 9.874e+01 1.722e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 19:55:34,734 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2900, loss[loss=0.08558, simple_loss=0.1182, pruned_loss=0.01857, audio_tagging_loss=0.007888, over 14778.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08917, pruned_loss=0.0121, audio_tagging_loss=0.008587, over 3044251.82 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:55:52,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=22.5 2023-11-26 19:56:17,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3546546.6666666665, ans=0.07 2023-11-26 19:56:22,894 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-11-26 19:56:23,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3546613.3333333335, ans=0.0 2023-11-26 19:56:26,689 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532000 2023-11-26 19:56:33,903 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2950, loss[loss=0.06436, simple_loss=0.08421, pruned_loss=0.01154, audio_tagging_loss=0.01072, over 15287.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08948, pruned_loss=0.01221, audio_tagging_loss=0.00861, over 3048044.52 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:57:23,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3546946.6666666665, ans=0.125 2023-11-26 19:57:25,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3546946.6666666665, ans=0.125 2023-11-26 19:57:25,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3546946.6666666665, ans=0.125 2023-11-26 19:57:26,198 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532050 2023-11-26 19:57:30,291 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.991e+01 9.608e+01 1.049e+02 1.344e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 19:57:30,316 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3000, loss[loss=0.07802, simple_loss=0.09895, pruned_loss=0.01928, audio_tagging_loss=0.009263, over 15017.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09006, pruned_loss=0.01248, audio_tagging_loss=0.008705, over 3047972.49 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:57:30,317 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 19:58:03,036 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05745, simple_loss=0.05048, pruned_loss=0.005228, audio_tagging_loss=0.02698, over 4681554.00 frames. 2023-11-26 19:58:03,037 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 19:58:06,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-26 19:58:17,757 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:58:29,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3547146.6666666665, ans=0.125 2023-11-26 19:58:54,324 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532100 2023-11-26 19:58:57,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3547280.0, ans=0.125 2023-11-26 19:59:00,131 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3050, loss[loss=0.05493, simple_loss=0.06491, pruned_loss=0.01487, audio_tagging_loss=0.007601, over 14011.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09104, pruned_loss=0.01283, audio_tagging_loss=0.008624, over 3051690.06 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:59:02,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3547346.6666666665, ans=0.07 2023-11-26 19:59:31,802 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:59:51,774 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532150 2023-11-26 19:59:55,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.969e+01 9.484e+01 1.021e+02 1.234e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 19:59:55,923 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3100, loss[loss=0.06413, simple_loss=0.07978, pruned_loss=0.01265, audio_tagging_loss=0.01159, over 14858.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09032, pruned_loss=0.0127, audio_tagging_loss=0.008732, over 3049278.13 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:00:01,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3547680.0, ans=0.125 2023-11-26 20:00:19,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3547813.3333333335, ans=0.0 2023-11-26 20:00:30,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3547880.0, ans=0.1 2023-11-26 20:00:42,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-26 20:00:45,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3547946.6666666665, ans=0.125 2023-11-26 20:00:47,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532200 2023-11-26 20:00:51,694 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3150, loss[loss=0.06435, simple_loss=0.08451, pruned_loss=0.01015, audio_tagging_loss=0.01194, over 16246.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09041, pruned_loss=0.0126, audio_tagging_loss=0.008793, over 3044712.02 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:00:53,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3548013.3333333335, ans=0.125 2023-11-26 20:00:59,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-26 20:00:59,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=22.5 2023-11-26 20:01:28,103 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:01:30,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3548213.3333333335, ans=0.125 2023-11-26 20:01:43,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532250 2023-11-26 20:01:48,389 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3200, loss[loss=0.07109, simple_loss=0.08833, pruned_loss=0.01434, audio_tagging_loss=0.01258, over 15599.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09059, pruned_loss=0.01257, audio_tagging_loss=0.008839, over 3047565.61 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:01:49,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.870e+01 9.654e+01 1.076e+02 1.284e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-26 20:01:54,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3548346.6666666665, ans=0.2 2023-11-26 20:02:23,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3548546.6666666665, ans=0.1 2023-11-26 20:02:32,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3548613.3333333335, ans=0.125 2023-11-26 20:02:35,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-26 20:02:35,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3548613.3333333335, ans=0.04949747468305833 2023-11-26 20:02:40,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532300 2023-11-26 20:02:43,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3548613.3333333335, ans=0.0 2023-11-26 20:02:44,874 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3250, loss[loss=0.07822, simple_loss=0.1076, pruned_loss=0.0167, audio_tagging_loss=0.007705, over 15576.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09045, pruned_loss=0.01251, audio_tagging_loss=0.008801, over 3046172.42 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:03:21,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3548880.0, ans=0.0 2023-11-26 20:03:35,905 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532350 2023-11-26 20:03:37,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3548946.6666666665, ans=0.125 2023-11-26 20:03:40,049 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3300, loss[loss=0.08967, simple_loss=0.1272, pruned_loss=0.01978, audio_tagging_loss=0.006293, over 15617.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09002, pruned_loss=0.01256, audio_tagging_loss=0.008901, over 3047793.69 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:03:41,084 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.931e+01 9.545e+01 1.032e+02 1.663e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 20:03:50,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3549080.0, ans=0.2 2023-11-26 20:04:08,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3549146.6666666665, ans=0.0 2023-11-26 20:04:20,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3549213.3333333335, ans=0.0 2023-11-26 20:04:31,002 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532400 2023-11-26 20:04:35,383 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3350, loss[loss=0.05725, simple_loss=0.07716, pruned_loss=0.009486, audio_tagging_loss=0.009184, over 14767.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09007, pruned_loss=0.01233, audio_tagging_loss=0.008872, over 3051352.30 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:04:39,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=22.5 2023-11-26 20:04:56,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=22.5 2023-11-26 20:05:27,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532450 2023-11-26 20:05:31,380 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3400, loss[loss=0.05956, simple_loss=0.08089, pruned_loss=0.009885, audio_tagging_loss=0.009229, over 15552.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08973, pruned_loss=0.01239, audio_tagging_loss=0.008783, over 3052335.73 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:05:33,489 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.780e+01 9.356e+01 1.019e+02 1.296e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 20:05:40,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3549680.0, ans=0.0 2023-11-26 20:05:42,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3549746.6666666665, ans=15.0 2023-11-26 20:05:45,894 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.28 vs. limit=15.0 2023-11-26 20:06:18,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3549946.6666666665, ans=0.1 2023-11-26 20:06:22,035 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532500 2023-11-26 20:06:26,250 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3450, loss[loss=0.05158, simple_loss=0.06933, pruned_loss=0.008125, audio_tagging_loss=0.008789, over 14394.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08929, pruned_loss=0.01235, audio_tagging_loss=0.008728, over 3045727.94 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:06:35,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=12.0 2023-11-26 20:06:40,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3550080.0, ans=0.125 2023-11-26 20:06:49,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=22.5 2023-11-26 20:06:54,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2023-11-26 20:07:03,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3550213.3333333335, ans=0.1 2023-11-26 20:07:06,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3550213.3333333335, ans=0.125 2023-11-26 20:07:17,342 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532550 2023-11-26 20:07:19,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3550280.0, ans=0.125 2023-11-26 20:07:21,478 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3500, loss[loss=0.06383, simple_loss=0.08642, pruned_loss=0.009711, audio_tagging_loss=0.01091, over 15589.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08979, pruned_loss=0.01236, audio_tagging_loss=0.008683, over 3050386.20 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:07:23,648 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 8.941e+01 9.512e+01 1.027e+02 1.407e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 20:07:42,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3550413.3333333335, ans=0.125 2023-11-26 20:07:45,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3550480.0, ans=0.125 2023-11-26 20:07:50,189 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:08:14,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532600 2023-11-26 20:08:14,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3550613.3333333335, ans=0.125 2023-11-26 20:08:19,287 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3550, loss[loss=0.06569, simple_loss=0.09234, pruned_loss=0.01293, audio_tagging_loss=0.006585, over 14851.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08967, pruned_loss=0.01234, audio_tagging_loss=0.008591, over 3040334.02 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:08:24,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3550680.0, ans=0.0 2023-11-26 20:08:37,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3550746.6666666665, ans=0.0 2023-11-26 20:08:40,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550813.3333333335, ans=0.1 2023-11-26 20:08:42,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3550813.3333333335, ans=0.0 2023-11-26 20:09:10,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532650 2023-11-26 20:09:14,577 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3600, loss[loss=0.07859, simple_loss=0.1037, pruned_loss=0.01844, audio_tagging_loss=0.008305, over 14798.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09027, pruned_loss=0.01239, audio_tagging_loss=0.008593, over 3041239.52 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:09:16,640 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.819e+01 9.427e+01 1.004e+02 1.284e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 20:09:17,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3551013.3333333335, ans=0.125 2023-11-26 20:09:21,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3551013.3333333335, ans=0.2 2023-11-26 20:09:30,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3551080.0, ans=0.1 2023-11-26 20:09:42,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3551146.6666666665, ans=22.5 2023-11-26 20:09:45,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551146.6666666665, ans=0.1 2023-11-26 20:09:48,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3551213.3333333335, ans=0.125 2023-11-26 20:09:53,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551213.3333333335, ans=0.1 2023-11-26 20:10:05,627 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532700 2023-11-26 20:10:05,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3551280.0, ans=0.125 2023-11-26 20:10:09,757 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3650, loss[loss=0.05694, simple_loss=0.08553, pruned_loss=0.006658, audio_tagging_loss=0.007521, over 14424.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09116, pruned_loss=0.01251, audio_tagging_loss=0.008488, over 3041547.86 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:11:02,776 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532750 2023-11-26 20:11:06,840 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3700, loss[loss=0.07698, simple_loss=0.1098, pruned_loss=0.01376, audio_tagging_loss=0.008308, over 16265.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09173, pruned_loss=0.01255, audio_tagging_loss=0.008471, over 3041357.58 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:11:08,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.774e+01 9.496e+01 1.016e+02 1.285e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 20:11:18,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3551746.6666666665, ans=0.2 2023-11-26 20:11:29,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3551813.3333333335, ans=0.2 2023-11-26 20:11:49,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3551880.0, ans=0.0 2023-11-26 20:11:58,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532800 2023-11-26 20:12:03,431 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3750, loss[loss=0.06198, simple_loss=0.08349, pruned_loss=0.01208, audio_tagging_loss=0.008155, over 15039.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09188, pruned_loss=0.01254, audio_tagging_loss=0.00841, over 3044555.18 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:12:04,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-26 20:12:06,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3552013.3333333335, ans=0.1 2023-11-26 20:12:21,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3552080.0, ans=0.0 2023-11-26 20:12:28,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3552146.6666666665, ans=0.125 2023-11-26 20:12:34,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3552146.6666666665, ans=0.125 2023-11-26 20:12:41,467 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:12:50,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3552280.0, ans=15.0 2023-11-26 20:12:54,249 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532850 2023-11-26 20:12:58,458 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3800, loss[loss=0.05291, simple_loss=0.06642, pruned_loss=0.007246, audio_tagging_loss=0.01245, over 14119.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09099, pruned_loss=0.01241, audio_tagging_loss=0.008503, over 3046764.28 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:13:00,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.984e+01 9.632e+01 1.029e+02 1.593e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-26 20:13:15,016 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:13:19,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2023-11-26 20:13:20,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3552480.0, ans=0.1 2023-11-26 20:13:32,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3552546.6666666665, ans=0.0 2023-11-26 20:13:42,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3552613.3333333335, ans=0.125 2023-11-26 20:13:49,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532900 2023-11-26 20:13:50,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-11-26 20:13:54,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=22.5 2023-11-26 20:13:54,534 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3850, loss[loss=0.07216, simple_loss=0.09301, pruned_loss=0.01709, audio_tagging_loss=0.008565, over 14567.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09052, pruned_loss=0.01235, audio_tagging_loss=0.008699, over 3048171.23 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:14:09,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3552746.6666666665, ans=0.125 2023-11-26 20:14:20,614 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:14:25,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3552813.3333333335, ans=0.2 2023-11-26 20:14:35,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3552880.0, ans=0.0 2023-11-26 20:14:43,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3552946.6666666665, ans=0.125 2023-11-26 20:14:45,347 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532950 2023-11-26 20:14:49,654 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3900, loss[loss=0.0646, simple_loss=0.09012, pruned_loss=0.009329, audio_tagging_loss=0.01021, over 14885.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08867, pruned_loss=0.0121, audio_tagging_loss=0.008846, over 3042894.99 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:14:52,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.931e+01 9.529e+01 1.011e+02 1.303e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 20:15:11,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2023-11-26 20:15:22,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3553213.3333333335, ans=0.1 2023-11-26 20:15:22,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3553213.3333333335, ans=0.2 2023-11-26 20:15:31,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3553213.3333333335, ans=0.125 2023-11-26 20:15:39,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3553280.0, ans=0.125 2023-11-26 20:15:40,857 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533000 2023-11-26 20:15:45,314 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3950, loss[loss=0.07622, simple_loss=0.1051, pruned_loss=0.01431, audio_tagging_loss=0.009345, over 14907.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08946, pruned_loss=0.01235, audio_tagging_loss=0.008913, over 3046946.18 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:16:00,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3553413.3333333335, ans=0.015 2023-11-26 20:16:04,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3553413.3333333335, ans=0.2 2023-11-26 20:16:36,496 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533050 2023-11-26 20:16:42,277 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4000, loss[loss=0.06485, simple_loss=0.08761, pruned_loss=0.01327, audio_tagging_loss=0.007773, over 14327.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08929, pruned_loss=0.01234, audio_tagging_loss=0.008992, over 3039141.86 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:16:44,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 8.850e+01 9.399e+01 1.031e+02 1.680e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 20:16:54,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3553746.6666666665, ans=0.125 2023-11-26 20:16:56,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3553746.6666666665, ans=0.125 2023-11-26 20:17:23,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3553880.0, ans=0.125 2023-11-26 20:17:33,297 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533100 2023-11-26 20:17:37,517 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4050, loss[loss=0.04758, simple_loss=0.0637, pruned_loss=0.006871, audio_tagging_loss=0.008861, over 16209.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08873, pruned_loss=0.01227, audio_tagging_loss=0.008982, over 3038054.49 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:17:37,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3554013.3333333335, ans=0.125 2023-11-26 20:17:39,680 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:17:40,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2023-11-26 20:17:53,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2023-11-26 20:18:29,019 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533150 2023-11-26 20:18:29,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554280.0, ans=0.1 2023-11-26 20:18:33,721 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4100, loss[loss=0.07602, simple_loss=0.1125, pruned_loss=0.01521, audio_tagging_loss=0.004573, over 15221.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08903, pruned_loss=0.01234, audio_tagging_loss=0.009037, over 3041422.48 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:18:36,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.812e+01 9.418e+01 1.019e+02 1.290e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 20:18:39,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3554346.6666666665, ans=0.125 2023-11-26 20:18:45,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3554413.3333333335, ans=0.0 2023-11-26 20:18:50,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3554413.3333333335, ans=0.0 2023-11-26 20:18:53,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3554413.3333333335, ans=0.125 2023-11-26 20:18:55,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3554480.0, ans=0.07 2023-11-26 20:19:04,356 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:19:10,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3554546.6666666665, ans=0.05 2023-11-26 20:19:13,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-26 20:19:24,915 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533200 2023-11-26 20:19:29,891 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4150, loss[loss=0.04343, simple_loss=0.05108, pruned_loss=0.006796, audio_tagging_loss=0.01109, over 15759.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08935, pruned_loss=0.01231, audio_tagging_loss=0.008901, over 3044682.64 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:19:33,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-26 20:19:38,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3554680.0, ans=22.5 2023-11-26 20:19:43,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3554746.6666666665, ans=0.125 2023-11-26 20:19:47,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3554746.6666666665, ans=0.125 2023-11-26 20:19:58,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3554813.3333333335, ans=0.0 2023-11-26 20:20:09,860 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:20:15,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3554946.6666666665, ans=0.125 2023-11-26 20:20:17,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3554946.6666666665, ans=0.125 2023-11-26 20:20:21,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3554946.6666666665, ans=0.1 2023-11-26 20:20:21,935 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533250 2023-11-26 20:20:26,211 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4200, loss[loss=0.07691, simple_loss=0.09409, pruned_loss=0.02039, audio_tagging_loss=0.009475, over 15465.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08912, pruned_loss=0.01218, audio_tagging_loss=0.008761, over 3038744.30 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:20:29,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.868e+01 9.396e+01 9.993e+01 1.238e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 20:21:02,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3555213.3333333335, ans=0.125 2023-11-26 20:21:16,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3555280.0, ans=0.1 2023-11-26 20:21:17,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533300 2023-11-26 20:21:19,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3555280.0, ans=0.0 2023-11-26 20:21:21,706 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4250, loss[loss=0.07854, simple_loss=0.1145, pruned_loss=0.01354, audio_tagging_loss=0.007757, over 16339.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08952, pruned_loss=0.01213, audio_tagging_loss=0.008654, over 3044181.23 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:21:21,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3555346.6666666665, ans=0.125 2023-11-26 20:21:41,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3555413.3333333335, ans=0.125 2023-11-26 20:21:44,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-26 20:21:47,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3555480.0, ans=0.0 2023-11-26 20:21:49,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3555480.0, ans=0.2 2023-11-26 20:22:02,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.08 vs. limit=6.0 2023-11-26 20:22:07,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2023-11-26 20:22:13,657 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533350 2023-11-26 20:22:16,384 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-26 20:22:17,930 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4300, loss[loss=0.06456, simple_loss=0.09132, pruned_loss=0.01012, audio_tagging_loss=0.008779, over 16044.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0913, pruned_loss=0.01239, audio_tagging_loss=0.008546, over 3055313.33 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:22:19,136 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2023-11-26 20:22:21,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.104e+01 9.879e+01 1.029e+02 1.419e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-26 20:22:26,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3555680.0, ans=0.125 2023-11-26 20:22:27,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555680.0, ans=0.1 2023-11-26 20:22:30,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3555746.6666666665, ans=0.1 2023-11-26 20:23:10,859 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533400 2023-11-26 20:23:15,330 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4350, loss[loss=0.0625, simple_loss=0.07625, pruned_loss=0.01238, audio_tagging_loss=0.01199, over 17099.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09032, pruned_loss=0.01229, audio_tagging_loss=0.008624, over 3051814.67 frames. ], batch size: 65, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:23:15,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3556013.3333333335, ans=0.125 2023-11-26 20:23:26,435 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-26 20:23:29,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3556080.0, ans=0.125 2023-11-26 20:23:31,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3556080.0, ans=0.125 2023-11-26 20:24:06,182 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533450 2023-11-26 20:24:10,374 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4400, loss[loss=0.07289, simple_loss=0.09183, pruned_loss=0.01889, audio_tagging_loss=0.008084, over 15688.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08975, pruned_loss=0.01215, audio_tagging_loss=0.008677, over 3055326.51 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:24:13,576 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.838e+01 9.451e+01 1.042e+02 1.230e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 20:24:15,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-11-26 20:24:49,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3556546.6666666665, ans=0.07 2023-11-26 20:24:50,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3556546.6666666665, ans=0.0 2023-11-26 20:25:01,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533500 2023-11-26 20:25:06,068 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4450, loss[loss=0.07493, simple_loss=0.1004, pruned_loss=0.01651, audio_tagging_loss=0.008206, over 15259.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09039, pruned_loss=0.0122, audio_tagging_loss=0.008571, over 3057539.09 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:25:06,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3556680.0, ans=0.035 2023-11-26 20:25:06,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3556680.0, ans=0.2 2023-11-26 20:25:11,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3556680.0, ans=0.125 2023-11-26 20:25:15,357 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:25:17,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3556746.6666666665, ans=0.125 2023-11-26 20:25:34,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=15.0 2023-11-26 20:25:53,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3556946.6666666665, ans=0.2 2023-11-26 20:25:58,328 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533550 2023-11-26 20:25:59,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-26 20:26:02,376 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4500, loss[loss=0.06033, simple_loss=0.0843, pruned_loss=0.009723, audio_tagging_loss=0.008455, over 15488.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09112, pruned_loss=0.0123, audio_tagging_loss=0.008531, over 3058377.70 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:26:05,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.109e+01 9.005e+01 9.519e+01 1.049e+02 1.463e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 20:26:07,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-26 20:26:31,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3557146.6666666665, ans=0.125 2023-11-26 20:26:53,443 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533600 2023-11-26 20:26:55,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3557280.0, ans=15.0 2023-11-26 20:26:56,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3557280.0, ans=0.0 2023-11-26 20:26:57,878 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4550, loss[loss=0.06889, simple_loss=0.09055, pruned_loss=0.01431, audio_tagging_loss=0.009305, over 14069.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09215, pruned_loss=0.01253, audio_tagging_loss=0.008466, over 3055398.01 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:27:17,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3557413.3333333335, ans=0.0 2023-11-26 20:27:31,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.51 vs. limit=10.0 2023-11-26 20:27:40,701 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:27:40,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3557546.6666666665, ans=0.125 2023-11-26 20:27:49,225 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533650 2023-11-26 20:27:53,373 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4600, loss[loss=0.0938, simple_loss=0.1328, pruned_loss=0.022, audio_tagging_loss=0.005383, over 15126.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09245, pruned_loss=0.01268, audio_tagging_loss=0.008536, over 3058158.90 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:27:56,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.920e+01 9.626e+01 1.020e+02 1.318e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 20:28:02,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3557680.0, ans=0.0 2023-11-26 20:28:18,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3557813.3333333335, ans=0.125 2023-11-26 20:28:22,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3557813.3333333335, ans=0.2 2023-11-26 20:28:25,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3557813.3333333335, ans=0.125 2023-11-26 20:28:44,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2023-11-26 20:28:45,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533700 2023-11-26 20:28:50,193 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4650, loss[loss=0.04173, simple_loss=0.05586, pruned_loss=0.004579, audio_tagging_loss=0.009216, over 15110.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09152, pruned_loss=0.01242, audio_tagging_loss=0.008657, over 3060183.94 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:28:59,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3558013.3333333335, ans=0.125 2023-11-26 20:29:04,581 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.17 vs. limit=8.0 2023-11-26 20:29:28,344 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:29:41,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3558280.0, ans=0.1 2023-11-26 20:29:42,057 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533750 2023-11-26 20:29:46,208 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4700, loss[loss=0.0727, simple_loss=0.0896, pruned_loss=0.01805, audio_tagging_loss=0.009852, over 14369.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08994, pruned_loss=0.01224, audio_tagging_loss=0.008815, over 3057196.24 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:29:50,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.876e+01 9.435e+01 1.008e+02 1.247e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 20:30:20,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3558546.6666666665, ans=0.125 2023-11-26 20:30:31,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3558613.3333333335, ans=0.0 2023-11-26 20:30:36,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3558613.3333333335, ans=0.0 2023-11-26 20:30:36,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3558613.3333333335, ans=0.0 2023-11-26 20:30:37,103 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533800 2023-11-26 20:30:41,607 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4750, loss[loss=0.0745, simple_loss=0.1102, pruned_loss=0.01259, audio_tagging_loss=0.006828, over 15129.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08926, pruned_loss=0.01208, audio_tagging_loss=0.008947, over 3051943.45 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:30:48,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3558680.0, ans=0.125 2023-11-26 20:30:59,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3558746.6666666665, ans=0.125 2023-11-26 20:31:33,066 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533850 2023-11-26 20:31:38,354 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4800, loss[loss=0.0654, simple_loss=0.08391, pruned_loss=0.01322, audio_tagging_loss=0.01023, over 16038.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08854, pruned_loss=0.01196, audio_tagging_loss=0.009075, over 3052637.51 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:31:42,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.037e+01 9.476e+01 1.008e+02 1.757e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 20:31:44,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3559013.3333333335, ans=0.125 2023-11-26 20:31:56,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3559080.0, ans=0.125 2023-11-26 20:32:07,374 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-26 20:32:16,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3559213.3333333335, ans=0.125 2023-11-26 20:32:27,849 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-26 20:32:29,579 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533900 2023-11-26 20:32:34,181 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4850, loss[loss=0.05343, simple_loss=0.07931, pruned_loss=0.006904, audio_tagging_loss=0.006874, over 14660.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08885, pruned_loss=0.01206, audio_tagging_loss=0.009135, over 3048501.29 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:32:42,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3559346.6666666665, ans=0.125 2023-11-26 20:32:43,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3559413.3333333335, ans=0.125 2023-11-26 20:32:45,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3559413.3333333335, ans=0.2 2023-11-26 20:32:54,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3559480.0, ans=0.0 2023-11-26 20:33:25,457 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533950 2023-11-26 20:33:29,628 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4900, loss[loss=0.08293, simple_loss=0.1187, pruned_loss=0.01645, audio_tagging_loss=0.007129, over 14494.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08835, pruned_loss=0.01196, audio_tagging_loss=0.009022, over 3050010.51 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:33:34,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.987e+01 9.681e+01 1.025e+02 1.624e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-26 20:34:12,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3559880.0, ans=0.0 2023-11-26 20:34:13,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3559946.6666666665, ans=0.125 2023-11-26 20:34:14,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3559946.6666666665, ans=0.0 2023-11-26 20:34:19,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3559946.6666666665, ans=0.0 2023-11-26 20:34:20,405 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534000 2023-11-26 20:34:25,471 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4950, loss[loss=0.07351, simple_loss=0.09624, pruned_loss=0.01703, audio_tagging_loss=0.008354, over 14862.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08888, pruned_loss=0.01207, audio_tagging_loss=0.008921, over 3054501.60 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:34:27,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3560013.3333333335, ans=0.1 2023-11-26 20:34:32,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3560013.3333333335, ans=0.0 2023-11-26 20:34:35,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3560080.0, ans=0.0 2023-11-26 20:34:36,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3560080.0, ans=0.125 2023-11-26 20:34:50,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3560146.6666666665, ans=0.2 2023-11-26 20:34:53,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3560146.6666666665, ans=0.0 2023-11-26 20:35:05,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3560213.3333333335, ans=0.2 2023-11-26 20:35:16,316 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534050 2023-11-26 20:35:20,449 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5000, loss[loss=0.09127, simple_loss=0.13, pruned_loss=0.02022, audio_tagging_loss=0.006022, over 15579.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09048, pruned_loss=0.01223, audio_tagging_loss=0.008672, over 3059422.41 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:35:21,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3560346.6666666665, ans=0.125 2023-11-26 20:35:24,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3560346.6666666665, ans=0.09899494936611666 2023-11-26 20:35:26,187 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 9.102e+01 9.598e+01 1.044e+02 1.473e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 20:35:34,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3560413.3333333335, ans=0.125 2023-11-26 20:35:38,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3560413.3333333335, ans=0.125 2023-11-26 20:35:38,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.17 vs. limit=10.0 2023-11-26 20:35:49,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3560480.0, ans=0.2 2023-11-26 20:35:51,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3560480.0, ans=0.125 2023-11-26 20:36:04,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3560613.3333333335, ans=0.0 2023-11-26 20:36:04,794 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2023-11-26 20:36:12,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534100 2023-11-26 20:36:12,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3560613.3333333335, ans=0.125 2023-11-26 20:36:16,212 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5050, loss[loss=0.06491, simple_loss=0.07906, pruned_loss=0.014, audio_tagging_loss=0.01138, over 13616.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09013, pruned_loss=0.01222, audio_tagging_loss=0.008743, over 3050472.69 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:36:34,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3560746.6666666665, ans=0.04949747468305833 2023-11-26 20:37:07,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534150 2023-11-26 20:37:10,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3560946.6666666665, ans=0.0 2023-11-26 20:37:12,195 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5100, loss[loss=0.07415, simple_loss=0.0964, pruned_loss=0.01757, audio_tagging_loss=0.008377, over 15574.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09031, pruned_loss=0.01224, audio_tagging_loss=0.008635, over 3058720.09 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:37:18,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.922e+01 9.558e+01 1.035e+02 1.358e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 20:38:04,575 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534200 2023-11-26 20:38:07,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3561280.0, ans=10.0 2023-11-26 20:38:09,083 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5150, loss[loss=0.05296, simple_loss=0.06316, pruned_loss=0.01049, audio_tagging_loss=0.01089, over 14259.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08895, pruned_loss=0.01214, audio_tagging_loss=0.008683, over 3057085.84 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:38:46,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3561546.6666666665, ans=0.125 2023-11-26 20:38:52,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-11-26 20:39:00,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534250 2023-11-26 20:39:05,066 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5200, loss[loss=0.0483, simple_loss=0.06608, pruned_loss=0.007985, audio_tagging_loss=0.007276, over 14391.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08921, pruned_loss=0.01216, audio_tagging_loss=0.008651, over 3057274.37 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:39:05,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3561680.0, ans=0.1 2023-11-26 20:39:08,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3561680.0, ans=0.2 2023-11-26 20:39:10,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.817e+01 9.257e+01 9.950e+01 1.216e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 20:39:18,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3561746.6666666665, ans=0.125 2023-11-26 20:39:31,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3561813.3333333335, ans=0.125 2023-11-26 20:39:34,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3561813.3333333335, ans=0.125 2023-11-26 20:39:51,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2023-11-26 20:39:54,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-26 20:39:56,213 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534300 2023-11-26 20:39:56,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3561946.6666666665, ans=0.125 2023-11-26 20:40:00,359 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5250, loss[loss=0.05955, simple_loss=0.08181, pruned_loss=0.009886, audio_tagging_loss=0.008763, over 15833.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08962, pruned_loss=0.0122, audio_tagging_loss=0.008612, over 3055817.37 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:40:10,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3562013.3333333335, ans=0.1 2023-11-26 20:40:48,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3562280.0, ans=0.125 2023-11-26 20:40:53,569 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534350 2023-11-26 20:40:57,734 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5300, loss[loss=0.05417, simple_loss=0.0732, pruned_loss=0.009293, audio_tagging_loss=0.008272, over 15922.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0897, pruned_loss=0.0122, audio_tagging_loss=0.008586, over 3057299.41 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:41:02,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 8.749e+01 9.362e+01 1.021e+02 1.179e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 20:41:07,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3562413.3333333335, ans=0.0 2023-11-26 20:41:18,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3562480.0, ans=0.2 2023-11-26 20:41:23,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-26 20:41:26,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3562480.0, ans=0.125 2023-11-26 20:41:29,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3562546.6666666665, ans=0.125 2023-11-26 20:41:34,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3562546.6666666665, ans=0.035 2023-11-26 20:41:48,765 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534400 2023-11-26 20:41:53,340 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5350, loss[loss=0.07678, simple_loss=0.1014, pruned_loss=0.01731, audio_tagging_loss=0.008788, over 14781.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09053, pruned_loss=0.01239, audio_tagging_loss=0.008522, over 3049411.06 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:42:00,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=12.0 2023-11-26 20:42:07,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3562746.6666666665, ans=0.1 2023-11-26 20:42:23,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3562813.3333333335, ans=0.0 2023-11-26 20:42:40,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3562946.6666666665, ans=0.125 2023-11-26 20:42:45,093 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534450 2023-11-26 20:42:48,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2023-11-26 20:42:49,323 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5400, loss[loss=0.08302, simple_loss=0.1123, pruned_loss=0.01941, audio_tagging_loss=0.007475, over 14556.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09053, pruned_loss=0.01241, audio_tagging_loss=0.008578, over 3046444.32 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:42:55,198 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:42:55,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3563013.3333333335, ans=0.125 2023-11-26 20:42:56,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.834e+01 9.520e+01 1.043e+02 1.175e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 20:43:00,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.03 vs. limit=22.5 2023-11-26 20:43:01,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3563080.0, ans=0.0 2023-11-26 20:43:09,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3563080.0, ans=0.125 2023-11-26 20:43:13,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3563146.6666666665, ans=0.125 2023-11-26 20:43:16,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3563146.6666666665, ans=0.09899494936611666 2023-11-26 20:43:37,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3563280.0, ans=0.125 2023-11-26 20:43:37,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3563280.0, ans=0.0 2023-11-26 20:43:42,006 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534500 2023-11-26 20:43:46,136 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5450, loss[loss=0.06955, simple_loss=0.09041, pruned_loss=0.01469, audio_tagging_loss=0.009655, over 14904.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09084, pruned_loss=0.01256, audio_tagging_loss=0.008625, over 3041953.51 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:43:47,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=22.5 2023-11-26 20:43:51,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3563346.6666666665, ans=0.125 2023-11-26 20:44:33,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=12.0 2023-11-26 20:44:37,371 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534550 2023-11-26 20:44:37,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3563613.3333333335, ans=0.0 2023-11-26 20:44:41,539 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5500, loss[loss=0.06993, simple_loss=0.09429, pruned_loss=0.01369, audio_tagging_loss=0.009092, over 14696.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09009, pruned_loss=0.01251, audio_tagging_loss=0.008675, over 3043428.83 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:44:47,928 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.118e+01 9.897e+01 1.074e+02 1.555e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-26 20:44:50,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3563680.0, ans=0.125 2023-11-26 20:45:32,791 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534600 2023-11-26 20:45:37,296 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5550, loss[loss=0.05655, simple_loss=0.07418, pruned_loss=0.01085, audio_tagging_loss=0.008605, over 17098.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08972, pruned_loss=0.0125, audio_tagging_loss=0.008778, over 3046027.43 frames. ], batch size: 66, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:45:39,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3564013.3333333335, ans=0.125 2023-11-26 20:45:44,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-11-26 20:45:45,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.83 vs. limit=22.5 2023-11-26 20:45:51,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3564080.0, ans=0.0 2023-11-26 20:46:00,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3564146.6666666665, ans=0.125 2023-11-26 20:46:11,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3564213.3333333335, ans=0.125 2023-11-26 20:46:16,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3564213.3333333335, ans=0.125 2023-11-26 20:46:20,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3564280.0, ans=0.0 2023-11-26 20:46:26,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3564280.0, ans=0.2 2023-11-26 20:46:29,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534650 2023-11-26 20:46:32,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=8.0 2023-11-26 20:46:34,582 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5600, loss[loss=0.08122, simple_loss=0.1117, pruned_loss=0.01421, audio_tagging_loss=0.01115, over 15469.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08988, pruned_loss=0.01238, audio_tagging_loss=0.00883, over 3046068.32 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:46:39,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3564346.6666666665, ans=0.025 2023-11-26 20:46:40,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.848e+01 9.516e+01 1.047e+02 1.275e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 20:46:43,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-11-26 20:46:54,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3564413.3333333335, ans=0.125 2023-11-26 20:46:56,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3564480.0, ans=0.1 2023-11-26 20:47:02,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3564480.0, ans=0.1 2023-11-26 20:47:14,939 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:47:19,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3564613.3333333335, ans=0.125 2023-11-26 20:47:19,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-26 20:47:25,538 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534700 2023-11-26 20:47:29,696 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5650, loss[loss=0.04506, simple_loss=0.05701, pruned_loss=0.007446, audio_tagging_loss=0.00911, over 14336.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09023, pruned_loss=0.01236, audio_tagging_loss=0.008869, over 3050454.38 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:47:37,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3564680.0, ans=0.0 2023-11-26 20:48:20,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-11-26 20:48:21,012 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534750 2023-11-26 20:48:25,211 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5700, loss[loss=0.05719, simple_loss=0.07623, pruned_loss=0.01149, audio_tagging_loss=0.007586, over 15660.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08979, pruned_loss=0.0123, audio_tagging_loss=0.008803, over 3054605.05 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:48:27,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3565013.3333333335, ans=0.125 2023-11-26 20:48:27,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3565013.3333333335, ans=0.0 2023-11-26 20:48:28,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3565013.3333333335, ans=0.0 2023-11-26 20:48:33,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.707e+01 9.299e+01 1.009e+02 1.151e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 20:49:03,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3565213.3333333335, ans=0.1 2023-11-26 20:49:05,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3565213.3333333335, ans=0.0 2023-11-26 20:49:15,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3565280.0, ans=0.2 2023-11-26 20:49:16,925 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534800 2023-11-26 20:49:21,914 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5750, loss[loss=0.06886, simple_loss=0.09135, pruned_loss=0.01276, audio_tagging_loss=0.01042, over 15804.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08995, pruned_loss=0.01227, audio_tagging_loss=0.008726, over 3051459.86 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:49:28,717 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2023-11-26 20:49:29,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3565346.6666666665, ans=0.0 2023-11-26 20:49:51,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3565480.0, ans=0.1 2023-11-26 20:49:56,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2023-11-26 20:50:10,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2023-11-26 20:50:12,609 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534850 2023-11-26 20:50:16,756 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5800, loss[loss=0.07514, simple_loss=0.1053, pruned_loss=0.01548, audio_tagging_loss=0.007023, over 14545.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08977, pruned_loss=0.0122, audio_tagging_loss=0.008588, over 3052842.34 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:50:21,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3565680.0, ans=0.07 2023-11-26 20:50:24,134 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.906e+01 9.529e+01 1.040e+02 1.512e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 20:50:37,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3565813.3333333335, ans=0.0 2023-11-26 20:51:07,130 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534900 2023-11-26 20:51:10,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3566013.3333333335, ans=0.0 2023-11-26 20:51:11,329 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5850, loss[loss=0.06652, simple_loss=0.09209, pruned_loss=0.01319, audio_tagging_loss=0.007284, over 15094.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0894, pruned_loss=0.0123, audio_tagging_loss=0.008488, over 3046928.02 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:51:21,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3566080.0, ans=0.1 2023-11-26 20:51:34,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3566146.6666666665, ans=0.2 2023-11-26 20:51:36,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-11-26 20:51:50,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2023-11-26 20:51:52,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3566213.3333333335, ans=0.125 2023-11-26 20:52:01,264 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534950 2023-11-26 20:52:05,960 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5900, loss[loss=0.04461, simple_loss=0.05671, pruned_loss=0.005436, audio_tagging_loss=0.01082, over 14228.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08893, pruned_loss=0.01223, audio_tagging_loss=0.008519, over 3041963.53 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:52:13,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-26 20:52:14,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.767e+01 9.381e+01 1.012e+02 1.422e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 20:52:17,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3566413.3333333335, ans=0.125 2023-11-26 20:52:24,332 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:52:31,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-26 20:52:48,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3566546.6666666665, ans=0.125 2023-11-26 20:52:55,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3566613.3333333335, ans=0.125 2023-11-26 20:52:57,790 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535000 2023-11-26 20:53:02,261 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5950, loss[loss=0.04466, simple_loss=0.05041, pruned_loss=0.008609, audio_tagging_loss=0.01085, over 14880.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09012, pruned_loss=0.01233, audio_tagging_loss=0.008468, over 3046297.07 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:53:29,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3566813.3333333335, ans=0.125 2023-11-26 20:53:52,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3566946.6666666665, ans=0.0 2023-11-26 20:53:53,247 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535050 2023-11-26 20:53:53,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-26 20:53:57,416 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6000, loss[loss=0.07616, simple_loss=0.09908, pruned_loss=0.0192, audio_tagging_loss=0.007424, over 15424.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08973, pruned_loss=0.01226, audio_tagging_loss=0.008461, over 3048763.23 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:53:57,417 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 20:54:29,562 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05766, simple_loss=0.05058, pruned_loss=0.005348, audio_tagging_loss=0.02702, over 4681554.00 frames. 2023-11-26 20:54:29,563 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 20:54:34,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3567013.3333333335, ans=0.0 2023-11-26 20:54:35,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-11-26 20:54:37,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.765e+01 9.407e+01 1.018e+02 1.240e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 20:54:37,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3567013.3333333335, ans=0.125 2023-11-26 20:54:38,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3567013.3333333335, ans=0.1 2023-11-26 20:54:48,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3567080.0, ans=0.1 2023-11-26 20:54:50,435 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2023-11-26 20:55:09,198 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:55:20,915 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535100 2023-11-26 20:55:25,121 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6050, loss[loss=0.04713, simple_loss=0.04849, pruned_loss=0.004208, audio_tagging_loss=0.01867, over 15748.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08965, pruned_loss=0.01222, audio_tagging_loss=0.0085, over 3049016.60 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:55:40,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3567413.3333333335, ans=0.125 2023-11-26 20:55:41,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3567413.3333333335, ans=0.125 2023-11-26 20:55:45,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3567480.0, ans=0.1 2023-11-26 20:55:53,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2023-11-26 20:55:58,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3567546.6666666665, ans=0.125 2023-11-26 20:56:16,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535150 2023-11-26 20:56:20,770 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6100, loss[loss=0.04584, simple_loss=0.05791, pruned_loss=0.007048, audio_tagging_loss=0.009842, over 13602.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08958, pruned_loss=0.01223, audio_tagging_loss=0.008487, over 3046590.88 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:56:28,134 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.965e+01 9.690e+01 1.035e+02 1.368e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 20:56:50,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3567813.3333333335, ans=0.2 2023-11-26 20:57:06,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3567946.6666666665, ans=0.125 2023-11-26 20:57:08,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2023-11-26 20:57:11,490 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535200 2023-11-26 20:57:13,314 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.40 vs. limit=8.0 2023-11-26 20:57:17,029 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6150, loss[loss=0.07835, simple_loss=0.1008, pruned_loss=0.02048, audio_tagging_loss=0.007488, over 14322.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08903, pruned_loss=0.01211, audio_tagging_loss=0.008579, over 3042784.61 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:57:19,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3568013.3333333335, ans=0.0 2023-11-26 20:57:20,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2023-11-26 20:57:23,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3568013.3333333335, ans=0.125 2023-11-26 20:57:26,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3568013.3333333335, ans=0.0 2023-11-26 20:57:31,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3568080.0, ans=0.125 2023-11-26 20:57:32,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-26 20:57:35,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3568080.0, ans=0.125 2023-11-26 20:57:40,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3568146.6666666665, ans=0.125 2023-11-26 20:58:04,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3568280.0, ans=0.025 2023-11-26 20:58:08,717 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535250 2023-11-26 20:58:12,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2023-11-26 20:58:13,468 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6200, loss[loss=0.07908, simple_loss=0.1118, pruned_loss=0.01495, audio_tagging_loss=0.008225, over 15290.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08891, pruned_loss=0.01208, audio_tagging_loss=0.008678, over 3038624.16 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:58:18,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3568346.6666666665, ans=0.125 2023-11-26 20:58:20,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.899e+01 9.421e+01 1.012e+02 1.333e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 20:58:27,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3568413.3333333335, ans=0.0 2023-11-26 20:58:28,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3568413.3333333335, ans=10.0 2023-11-26 20:58:28,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3568413.3333333335, ans=0.125 2023-11-26 20:58:43,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-11-26 20:58:54,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3568546.6666666665, ans=0.1 2023-11-26 20:59:03,986 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535300 2023-11-26 20:59:08,231 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6250, loss[loss=0.05284, simple_loss=0.06779, pruned_loss=0.009128, audio_tagging_loss=0.00982, over 16954.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08931, pruned_loss=0.0122, audio_tagging_loss=0.00881, over 3034939.75 frames. ], batch size: 69, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:59:10,684 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=22.5 2023-11-26 20:59:15,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3568680.0, ans=0.125 2023-11-26 20:59:33,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3568813.3333333335, ans=10.0 2023-11-26 20:59:36,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3568813.3333333335, ans=0.125 2023-11-26 20:59:38,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3568813.3333333335, ans=0.2 2023-11-26 20:59:45,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-26 20:59:55,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3568946.6666666665, ans=0.125 2023-11-26 20:59:58,304 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535350 2023-11-26 20:59:59,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3568946.6666666665, ans=0.5 2023-11-26 20:59:59,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3568946.6666666665, ans=0.1 2023-11-26 21:00:01,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3569013.3333333335, ans=0.2 2023-11-26 21:00:02,481 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6300, loss[loss=0.05109, simple_loss=0.07111, pruned_loss=0.008031, audio_tagging_loss=0.007503, over 14891.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08966, pruned_loss=0.01207, audio_tagging_loss=0.008861, over 3032311.63 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:00:12,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.840e+01 9.586e+01 1.026e+02 1.198e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:00:25,578 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:00:35,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2023-11-26 21:00:54,203 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535400 2023-11-26 21:00:58,579 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6350, loss[loss=0.047, simple_loss=0.06729, pruned_loss=0.005366, audio_tagging_loss=0.007989, over 15407.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09029, pruned_loss=0.01229, audio_tagging_loss=0.008801, over 3036698.77 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:01:02,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3569346.6666666665, ans=0.0 2023-11-26 21:01:04,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3569346.6666666665, ans=0.0 2023-11-26 21:01:25,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3569480.0, ans=0.125 2023-11-26 21:01:36,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2023-11-26 21:01:42,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2023-11-26 21:01:44,279 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2023-11-26 21:01:49,226 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535450 2023-11-26 21:01:53,951 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6400, loss[loss=0.0726, simple_loss=0.09467, pruned_loss=0.01627, audio_tagging_loss=0.009001, over 14918.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08937, pruned_loss=0.01209, audio_tagging_loss=0.00895, over 3029758.46 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:02:02,596 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.580e+01 9.385e+01 1.005e+02 1.222e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 21:02:03,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2023-11-26 21:02:20,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3569813.3333333335, ans=0.0 2023-11-26 21:02:24,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=22.5 2023-11-26 21:02:34,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3569880.0, ans=0.1 2023-11-26 21:02:36,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3569880.0, ans=0.125 2023-11-26 21:02:44,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535500 2023-11-26 21:02:46,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3569946.6666666665, ans=0.1 2023-11-26 21:02:48,864 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6450, loss[loss=0.05978, simple_loss=0.07713, pruned_loss=0.01104, audio_tagging_loss=0.01017, over 14493.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08883, pruned_loss=0.01195, audio_tagging_loss=0.009043, over 3036182.95 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:02:55,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3570013.3333333335, ans=0.2 2023-11-26 21:02:57,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3570013.3333333335, ans=0.125 2023-11-26 21:02:58,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3570013.3333333335, ans=0.2 2023-11-26 21:03:04,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=12.0 2023-11-26 21:03:04,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2023-11-26 21:03:29,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3570213.3333333335, ans=0.025 2023-11-26 21:03:40,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535550 2023-11-26 21:03:42,985 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:03:44,898 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6500, loss[loss=0.06167, simple_loss=0.08642, pruned_loss=0.01109, audio_tagging_loss=0.007372, over 16476.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08895, pruned_loss=0.01209, audio_tagging_loss=0.009094, over 3034508.15 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:03:47,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3570346.6666666665, ans=0.0 2023-11-26 21:03:53,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.670e+01 9.516e+01 1.047e+02 1.238e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 21:03:56,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3570413.3333333335, ans=0.125 2023-11-26 21:03:57,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-26 21:04:00,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3570413.3333333335, ans=0.2 2023-11-26 21:04:18,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3570546.6666666665, ans=0.0 2023-11-26 21:04:27,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3570546.6666666665, ans=0.2 2023-11-26 21:04:28,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3570613.3333333335, ans=0.125 2023-11-26 21:04:35,324 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535600 2023-11-26 21:04:39,833 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6550, loss[loss=0.06682, simple_loss=0.09254, pruned_loss=0.01304, audio_tagging_loss=0.00751, over 15143.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08971, pruned_loss=0.01228, audio_tagging_loss=0.008955, over 3035268.66 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:05:17,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3570880.0, ans=0.015 2023-11-26 21:05:18,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3570880.0, ans=0.07 2023-11-26 21:05:31,083 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535650 2023-11-26 21:05:35,372 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6600, loss[loss=0.06121, simple_loss=0.07859, pruned_loss=0.01337, audio_tagging_loss=0.008547, over 15444.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09023, pruned_loss=0.0124, audio_tagging_loss=0.008831, over 3035243.11 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:05:36,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3571013.3333333335, ans=0.07 2023-11-26 21:05:44,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.935e+01 9.455e+01 1.019e+02 1.266e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 21:06:05,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3571146.6666666665, ans=0.2 2023-11-26 21:06:15,628 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2023-11-26 21:06:26,879 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535700 2023-11-26 21:06:27,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3571280.0, ans=0.125 2023-11-26 21:06:31,004 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6650, loss[loss=0.06323, simple_loss=0.08524, pruned_loss=0.01244, audio_tagging_loss=0.008175, over 14527.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09004, pruned_loss=0.01235, audio_tagging_loss=0.00875, over 3032361.46 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:06:34,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3571346.6666666665, ans=0.125 2023-11-26 21:06:37,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.24 vs. limit=22.5 2023-11-26 21:06:38,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3571346.6666666665, ans=0.015 2023-11-26 21:06:46,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3571413.3333333335, ans=0.0 2023-11-26 21:06:50,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3571413.3333333335, ans=0.1 2023-11-26 21:07:02,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3571546.6666666665, ans=0.125 2023-11-26 21:07:10,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3571546.6666666665, ans=0.1 2023-11-26 21:07:18,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3571613.3333333335, ans=0.125 2023-11-26 21:07:21,147 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535750 2023-11-26 21:07:21,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3571613.3333333335, ans=0.125 2023-11-26 21:07:23,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3571613.3333333335, ans=0.125 2023-11-26 21:07:25,287 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6700, loss[loss=0.05088, simple_loss=0.06503, pruned_loss=0.008097, audio_tagging_loss=0.01027, over 16093.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0897, pruned_loss=0.01228, audio_tagging_loss=0.008632, over 3028069.33 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:07:31,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=22.5 2023-11-26 21:07:34,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.689e+01 9.559e+01 1.023e+02 3.616e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-26 21:07:34,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3571746.6666666665, ans=0.0 2023-11-26 21:07:36,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3571746.6666666665, ans=0.125 2023-11-26 21:08:03,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=22.5 2023-11-26 21:08:05,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3571880.0, ans=0.125 2023-11-26 21:08:15,811 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535800 2023-11-26 21:08:20,251 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6750, loss[loss=0.06962, simple_loss=0.08864, pruned_loss=0.01719, audio_tagging_loss=0.008111, over 14890.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08965, pruned_loss=0.0124, audio_tagging_loss=0.008595, over 3025861.84 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:08:48,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2023-11-26 21:09:04,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3572280.0, ans=0.125 2023-11-26 21:09:11,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535850 2023-11-26 21:09:14,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3572280.0, ans=0.2 2023-11-26 21:09:16,581 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6800, loss[loss=0.06293, simple_loss=0.08576, pruned_loss=0.009977, audio_tagging_loss=0.01007, over 15777.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09068, pruned_loss=0.0126, audio_tagging_loss=0.008499, over 3034612.18 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:09:20,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3572346.6666666665, ans=0.125 2023-11-26 21:09:23,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3572346.6666666665, ans=0.07 2023-11-26 21:09:26,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.870e+01 9.420e+01 1.023e+02 1.274e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 21:09:26,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-26 21:09:30,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3572413.3333333335, ans=0.125 2023-11-26 21:09:42,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3572480.0, ans=0.05 2023-11-26 21:09:53,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3572546.6666666665, ans=0.0 2023-11-26 21:10:01,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3572613.3333333335, ans=0.125 2023-11-26 21:10:07,137 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535900 2023-11-26 21:10:11,382 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6850, loss[loss=0.05774, simple_loss=0.07614, pruned_loss=0.008831, audio_tagging_loss=0.01084, over 15079.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08973, pruned_loss=0.01241, audio_tagging_loss=0.008512, over 3025562.47 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:10:11,606 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:10:16,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3572680.0, ans=0.125 2023-11-26 21:10:19,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2023-11-26 21:10:20,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3572680.0, ans=0.2 2023-11-26 21:10:27,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3572746.6666666665, ans=0.1 2023-11-26 21:10:38,824 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:10:44,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3572880.0, ans=0.125 2023-11-26 21:10:53,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=22.5 2023-11-26 21:11:02,648 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535950 2023-11-26 21:11:06,908 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6900, loss[loss=0.05965, simple_loss=0.08211, pruned_loss=0.009802, audio_tagging_loss=0.00879, over 15300.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08971, pruned_loss=0.01223, audio_tagging_loss=0.00842, over 3029345.70 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:11:18,673 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.747e+01 9.465e+01 1.018e+02 1.501e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 21:11:32,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3573146.6666666665, ans=0.125 2023-11-26 21:11:36,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2023-11-26 21:11:46,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3573213.3333333335, ans=0.0 2023-11-26 21:11:50,321 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:11:57,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536000 2023-11-26 21:12:05,230 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6950, loss[loss=0.08415, simple_loss=0.1097, pruned_loss=0.01864, audio_tagging_loss=0.01069, over 16724.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08948, pruned_loss=0.01212, audio_tagging_loss=0.008617, over 3037329.06 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:12:17,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3573413.3333333335, ans=0.1 2023-11-26 21:12:26,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3573480.0, ans=0.0 2023-11-26 21:12:27,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3573480.0, ans=0.2 2023-11-26 21:12:32,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2023-11-26 21:12:56,849 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536050 2023-11-26 21:13:00,997 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7000, loss[loss=0.05997, simple_loss=0.07518, pruned_loss=0.01228, audio_tagging_loss=0.0101, over 14464.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0894, pruned_loss=0.012, audio_tagging_loss=0.008686, over 3040260.57 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:13:12,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.901e+01 9.470e+01 1.019e+02 1.225e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 21:13:13,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3573746.6666666665, ans=0.125 2023-11-26 21:13:17,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3573746.6666666665, ans=0.1 2023-11-26 21:13:23,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=15.0 2023-11-26 21:13:27,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3573813.3333333335, ans=0.035 2023-11-26 21:13:38,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3573880.0, ans=22.5 2023-11-26 21:13:45,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3573946.6666666665, ans=0.125 2023-11-26 21:13:51,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536100 2023-11-26 21:13:51,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3573946.6666666665, ans=0.0 2023-11-26 21:13:54,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-11-26 21:13:56,031 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7050, loss[loss=0.07125, simple_loss=0.09228, pruned_loss=0.01674, audio_tagging_loss=0.008369, over 16274.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08897, pruned_loss=0.01205, audio_tagging_loss=0.008796, over 3039784.65 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:14:09,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3574080.0, ans=0.0 2023-11-26 21:14:17,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3574146.6666666665, ans=0.2 2023-11-26 21:14:21,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3574146.6666666665, ans=0.1 2023-11-26 21:14:45,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-26 21:14:46,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536150 2023-11-26 21:14:47,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3574280.0, ans=0.0 2023-11-26 21:14:51,223 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7100, loss[loss=0.0544, simple_loss=0.06695, pruned_loss=0.01125, audio_tagging_loss=0.009682, over 14598.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08908, pruned_loss=0.01208, audio_tagging_loss=0.008756, over 3048763.04 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:15:00,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3574346.6666666665, ans=0.0 2023-11-26 21:15:04,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.863e+01 9.458e+01 1.036e+02 1.512e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 21:15:06,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3574413.3333333335, ans=0.125 2023-11-26 21:15:13,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2023-11-26 21:15:21,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3574480.0, ans=0.125 2023-11-26 21:15:35,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3574613.3333333335, ans=0.0 2023-11-26 21:15:43,268 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536200 2023-11-26 21:15:47,677 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7150, loss[loss=0.07106, simple_loss=0.101, pruned_loss=0.01414, audio_tagging_loss=0.00641, over 14516.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08942, pruned_loss=0.01222, audio_tagging_loss=0.008778, over 3049627.17 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:15:49,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3574680.0, ans=0.125 2023-11-26 21:16:37,847 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536250 2023-11-26 21:16:42,047 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7200, loss[loss=0.06822, simple_loss=0.09742, pruned_loss=0.01241, audio_tagging_loss=0.007106, over 17497.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08976, pruned_loss=0.01221, audio_tagging_loss=0.00884, over 3057562.05 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:16:53,654 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.982e+01 9.532e+01 1.041e+02 1.325e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 21:16:53,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3575080.0, ans=0.2 2023-11-26 21:16:54,129 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2023-11-26 21:17:01,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3575080.0, ans=0.0 2023-11-26 21:17:06,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3575146.6666666665, ans=0.035 2023-11-26 21:17:32,470 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536300 2023-11-26 21:17:36,679 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7250, loss[loss=0.06108, simple_loss=0.08797, pruned_loss=0.00731, audio_tagging_loss=0.009787, over 14982.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08969, pruned_loss=0.01218, audio_tagging_loss=0.008862, over 3056135.88 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:17:45,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3575346.6666666665, ans=0.125 2023-11-26 21:17:51,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3575413.3333333335, ans=0.05 2023-11-26 21:18:11,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3575546.6666666665, ans=0.125 2023-11-26 21:18:12,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=22.5 2023-11-26 21:18:28,378 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536350 2023-11-26 21:18:33,113 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7300, loss[loss=0.07132, simple_loss=0.09967, pruned_loss=0.01442, audio_tagging_loss=0.007072, over 15723.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08975, pruned_loss=0.01217, audio_tagging_loss=0.008695, over 3052467.93 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:18:43,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3575746.6666666665, ans=15.0 2023-11-26 21:18:45,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.748e+01 9.464e+01 1.022e+02 1.262e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 21:18:50,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-26 21:18:51,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2023-11-26 21:19:03,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3575813.3333333335, ans=0.125 2023-11-26 21:19:03,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3575813.3333333335, ans=0.0 2023-11-26 21:19:07,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3575880.0, ans=0.07 2023-11-26 21:19:18,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3575946.6666666665, ans=0.05 2023-11-26 21:19:19,605 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:19:23,602 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536400 2023-11-26 21:19:28,017 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7350, loss[loss=0.05966, simple_loss=0.08306, pruned_loss=0.01118, audio_tagging_loss=0.006952, over 14995.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08987, pruned_loss=0.01218, audio_tagging_loss=0.008632, over 3056032.42 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:19:28,402 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2023-11-26 21:19:49,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3576146.6666666665, ans=0.125 2023-11-26 21:19:50,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3576146.6666666665, ans=0.125 2023-11-26 21:20:07,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2023-11-26 21:20:13,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3576280.0, ans=0.09899494936611666 2023-11-26 21:20:18,443 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536450 2023-11-26 21:20:22,636 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7400, loss[loss=0.04832, simple_loss=0.06272, pruned_loss=0.007785, audio_tagging_loss=0.009181, over 15211.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08934, pruned_loss=0.01193, audio_tagging_loss=0.008582, over 3051114.98 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:20:36,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.979e+01 9.560e+01 1.029e+02 2.303e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-26 21:21:11,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3576613.3333333335, ans=0.1 2023-11-26 21:21:14,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536500 2023-11-26 21:21:14,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3576613.3333333335, ans=0.1 2023-11-26 21:21:18,746 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7450, loss[loss=0.06127, simple_loss=0.08762, pruned_loss=0.008278, audio_tagging_loss=0.009177, over 15353.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08878, pruned_loss=0.01192, audio_tagging_loss=0.008493, over 3048433.03 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:21:20,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3576680.0, ans=0.0 2023-11-26 21:21:22,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3576680.0, ans=0.125 2023-11-26 21:21:27,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2023-11-26 21:21:41,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3576813.3333333335, ans=0.0 2023-11-26 21:21:56,829 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:22:03,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3576946.6666666665, ans=0.1 2023-11-26 21:22:07,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3576946.6666666665, ans=0.125 2023-11-26 21:22:09,793 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536550 2023-11-26 21:22:13,917 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7500, loss[loss=0.06208, simple_loss=0.09194, pruned_loss=0.01165, audio_tagging_loss=0.004465, over 14283.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08846, pruned_loss=0.01187, audio_tagging_loss=0.008538, over 3046342.95 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:22:16,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3577013.3333333335, ans=0.1 2023-11-26 21:22:18,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3577013.3333333335, ans=0.2 2023-11-26 21:22:24,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3577080.0, ans=0.2 2023-11-26 21:22:26,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.830e+01 9.434e+01 1.016e+02 1.615e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 21:22:50,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3577213.3333333335, ans=0.125 2023-11-26 21:22:53,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3577213.3333333335, ans=0.125 2023-11-26 21:23:04,447 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536600 2023-11-26 21:23:08,918 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7550, loss[loss=0.07516, simple_loss=0.1092, pruned_loss=0.01374, audio_tagging_loss=0.006822, over 15457.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08882, pruned_loss=0.01189, audio_tagging_loss=0.008417, over 3047762.64 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:23:12,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3577346.6666666665, ans=0.0 2023-11-26 21:23:21,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.68 vs. limit=10.0 2023-11-26 21:23:27,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3577413.3333333335, ans=0.05 2023-11-26 21:23:40,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3577480.0, ans=0.0 2023-11-26 21:23:46,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-11-26 21:23:49,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3577546.6666666665, ans=0.2 2023-11-26 21:23:54,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3577613.3333333335, ans=0.125 2023-11-26 21:24:00,006 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536650 2023-11-26 21:24:04,763 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7600, loss[loss=0.06913, simple_loss=0.1008, pruned_loss=0.0113, audio_tagging_loss=0.007415, over 14977.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08905, pruned_loss=0.01185, audio_tagging_loss=0.00836, over 3050874.62 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:24:06,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-26 21:24:14,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2023-11-26 21:24:17,497 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.794e+01 9.367e+01 9.817e+01 1.272e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 21:24:17,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3577746.6666666665, ans=0.2 2023-11-26 21:24:22,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2023-11-26 21:24:34,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3577813.3333333335, ans=0.125 2023-11-26 21:24:55,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3577946.6666666665, ans=0.125 2023-11-26 21:24:56,180 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536700 2023-11-26 21:24:59,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3578013.3333333335, ans=0.0 2023-11-26 21:25:00,344 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7650, loss[loss=0.05881, simple_loss=0.08393, pruned_loss=0.01015, audio_tagging_loss=0.006699, over 14629.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.08793, pruned_loss=0.01164, audio_tagging_loss=0.008437, over 3046100.43 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:25:13,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3578080.0, ans=0.0 2023-11-26 21:25:32,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3578146.6666666665, ans=0.1 2023-11-26 21:25:52,128 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536750 2023-11-26 21:25:53,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3578280.0, ans=0.125 2023-11-26 21:25:56,376 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7700, loss[loss=0.06677, simple_loss=0.08685, pruned_loss=0.0129, audio_tagging_loss=0.01044, over 15332.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08863, pruned_loss=0.01169, audio_tagging_loss=0.008502, over 3054512.48 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:26:10,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.777e+01 9.451e+01 1.024e+02 1.236e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 21:26:41,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3578613.3333333335, ans=0.1 2023-11-26 21:26:42,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3578613.3333333335, ans=0.125 2023-11-26 21:26:44,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3578613.3333333335, ans=0.1 2023-11-26 21:26:47,895 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536800 2023-11-26 21:26:52,884 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7750, loss[loss=0.0488, simple_loss=0.06113, pruned_loss=0.009793, audio_tagging_loss=0.008438, over 15275.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08942, pruned_loss=0.01183, audio_tagging_loss=0.008542, over 3048549.14 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:27:01,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3578680.0, ans=0.0 2023-11-26 21:27:20,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-11-26 21:27:44,494 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536850 2023-11-26 21:27:47,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3579013.3333333335, ans=0.125 2023-11-26 21:27:48,647 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7800, loss[loss=0.06312, simple_loss=0.08356, pruned_loss=0.01184, audio_tagging_loss=0.009502, over 15533.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08986, pruned_loss=0.01192, audio_tagging_loss=0.008652, over 3047974.26 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:27:51,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3579013.3333333335, ans=0.1 2023-11-26 21:28:01,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 9.125e+01 9.673e+01 1.032e+02 1.227e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 21:28:12,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-26 21:28:25,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3579213.3333333335, ans=0.125 2023-11-26 21:28:39,519 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536900 2023-11-26 21:28:43,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3579346.6666666665, ans=0.125 2023-11-26 21:28:44,305 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7850, loss[loss=0.08564, simple_loss=0.1178, pruned_loss=0.02129, audio_tagging_loss=0.005471, over 15176.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08999, pruned_loss=0.01193, audio_tagging_loss=0.00869, over 3052123.21 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:29:05,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3579413.3333333335, ans=0.125 2023-11-26 21:29:33,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3579613.3333333335, ans=0.125 2023-11-26 21:29:35,295 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536950 2023-11-26 21:29:38,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3579613.3333333335, ans=0.2 2023-11-26 21:29:39,960 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7900, loss[loss=0.06787, simple_loss=0.09828, pruned_loss=0.01017, audio_tagging_loss=0.008559, over 15748.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09011, pruned_loss=0.012, audio_tagging_loss=0.008773, over 3047644.86 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:29:53,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.961e+01 9.633e+01 1.012e+02 1.259e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 21:30:00,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3579746.6666666665, ans=0.1 2023-11-26 21:30:28,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3579946.6666666665, ans=0.125 2023-11-26 21:30:32,198 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537000 2023-11-26 21:30:36,636 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7950, loss[loss=0.05545, simple_loss=0.07532, pruned_loss=0.009669, audio_tagging_loss=0.008119, over 14804.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08942, pruned_loss=0.01202, audio_tagging_loss=0.008863, over 3046739.82 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:30:44,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3580013.3333333335, ans=0.2 2023-11-26 21:30:49,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3580080.0, ans=0.0 2023-11-26 21:30:50,452 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:30:55,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3580080.0, ans=0.0 2023-11-26 21:31:11,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2023-11-26 21:31:15,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3580213.3333333335, ans=0.0 2023-11-26 21:31:21,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3580280.0, ans=0.125 2023-11-26 21:31:27,706 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537050 2023-11-26 21:31:31,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2023-11-26 21:31:31,824 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8000, loss[loss=0.06038, simple_loss=0.07734, pruned_loss=0.01051, audio_tagging_loss=0.0112, over 14694.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08826, pruned_loss=0.01189, audio_tagging_loss=0.008945, over 3044357.48 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:31:38,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3580346.6666666665, ans=0.1 2023-11-26 21:31:45,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.727e+01 9.223e+01 9.988e+01 1.687e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 21:31:51,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3580413.3333333335, ans=0.0 2023-11-26 21:31:54,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2023-11-26 21:32:13,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3580546.6666666665, ans=0.0 2023-11-26 21:32:22,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537100 2023-11-26 21:32:27,659 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8050, loss[loss=0.05663, simple_loss=0.07066, pruned_loss=0.01053, audio_tagging_loss=0.01077, over 15377.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08814, pruned_loss=0.01195, audio_tagging_loss=0.008994, over 3045991.03 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:33:02,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3580880.0, ans=0.5 2023-11-26 21:33:10,628 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:33:15,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3580946.6666666665, ans=0.0 2023-11-26 21:33:19,959 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537150 2023-11-26 21:33:20,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3580946.6666666665, ans=0.1 2023-11-26 21:33:24,178 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8100, loss[loss=0.0438, simple_loss=0.06685, pruned_loss=0.003999, audio_tagging_loss=0.006373, over 16014.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08849, pruned_loss=0.01191, audio_tagging_loss=0.008895, over 3046932.23 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:33:33,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3581080.0, ans=0.2 2023-11-26 21:33:36,870 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.942e+01 9.751e+01 1.046e+02 1.316e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 21:34:04,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3581213.3333333335, ans=0.2 2023-11-26 21:34:15,182 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537200 2023-11-26 21:34:19,652 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8150, loss[loss=0.06005, simple_loss=0.08367, pruned_loss=0.008504, audio_tagging_loss=0.009712, over 15905.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08915, pruned_loss=0.01208, audio_tagging_loss=0.008708, over 3041420.70 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:35:00,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3581546.6666666665, ans=0.2 2023-11-26 21:35:10,741 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537250 2023-11-26 21:35:11,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3581613.3333333335, ans=0.2 2023-11-26 21:35:15,031 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8200, loss[loss=0.05208, simple_loss=0.07464, pruned_loss=0.00758, audio_tagging_loss=0.007182, over 15519.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08957, pruned_loss=0.01201, audio_tagging_loss=0.008598, over 3039029.28 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:35:17,777 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:35:26,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.68 vs. limit=10.0 2023-11-26 21:35:29,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.811e+01 9.586e+01 1.032e+02 1.518e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:35:37,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3581813.3333333335, ans=0.0 2023-11-26 21:36:07,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-26 21:36:08,183 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537300 2023-11-26 21:36:08,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3581946.6666666665, ans=0.125 2023-11-26 21:36:12,449 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8250, loss[loss=0.03675, simple_loss=0.04875, pruned_loss=0.005705, audio_tagging_loss=0.006671, over 15056.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08842, pruned_loss=0.01182, audio_tagging_loss=0.00858, over 3041483.77 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:36:44,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3582213.3333333335, ans=0.5 2023-11-26 21:36:58,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=22.5 2023-11-26 21:37:03,420 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537350 2023-11-26 21:37:07,512 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8300, loss[loss=0.06447, simple_loss=0.08946, pruned_loss=0.008288, audio_tagging_loss=0.01145, over 14293.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.0889, pruned_loss=0.01184, audio_tagging_loss=0.008605, over 3040316.69 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:37:19,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3582413.3333333335, ans=0.125 2023-11-26 21:37:19,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2023-11-26 21:37:20,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.995e+01 9.587e+01 1.028e+02 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:37:35,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3582480.0, ans=0.0 2023-11-26 21:37:37,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3582480.0, ans=0.125 2023-11-26 21:37:46,269 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-26 21:37:50,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3582546.6666666665, ans=0.125 2023-11-26 21:37:50,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-26 21:37:53,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.25 vs. limit=22.5 2023-11-26 21:37:58,393 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537400 2023-11-26 21:37:59,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3582613.3333333335, ans=0.125 2023-11-26 21:38:02,862 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8350, loss[loss=0.07755, simple_loss=0.1073, pruned_loss=0.01742, audio_tagging_loss=0.006495, over 15339.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08845, pruned_loss=0.01183, audio_tagging_loss=0.008525, over 3046224.14 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:38:16,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3582746.6666666665, ans=10.0 2023-11-26 21:38:32,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3582813.3333333335, ans=0.2 2023-11-26 21:38:32,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-26 21:38:35,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3582880.0, ans=0.125 2023-11-26 21:38:43,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2023-11-26 21:38:46,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3582946.6666666665, ans=0.1 2023-11-26 21:38:54,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=15.0 2023-11-26 21:38:54,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537450 2023-11-26 21:38:59,487 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8400, loss[loss=0.06335, simple_loss=0.08956, pruned_loss=0.01151, audio_tagging_loss=0.007065, over 15372.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08857, pruned_loss=0.01181, audio_tagging_loss=0.008478, over 3045005.87 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:39:03,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-26 21:39:13,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.925e+01 9.429e+01 9.938e+01 1.352e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 21:39:14,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3583080.0, ans=0.1 2023-11-26 21:39:25,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3583146.6666666665, ans=0.125 2023-11-26 21:39:27,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3583146.6666666665, ans=0.2 2023-11-26 21:39:36,308 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-26 21:39:49,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3583280.0, ans=0.125 2023-11-26 21:39:50,060 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537500 2023-11-26 21:39:54,179 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8450, loss[loss=0.04768, simple_loss=0.05861, pruned_loss=0.007431, audio_tagging_loss=0.01094, over 14559.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08843, pruned_loss=0.01193, audio_tagging_loss=0.008483, over 3044829.08 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:39:55,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2023-11-26 21:40:07,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3583413.3333333335, ans=0.125 2023-11-26 21:40:08,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3583413.3333333335, ans=0.05 2023-11-26 21:40:35,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3583546.6666666665, ans=0.0 2023-11-26 21:40:44,891 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537550 2023-11-26 21:40:49,053 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8500, loss[loss=0.06961, simple_loss=0.09476, pruned_loss=0.01449, audio_tagging_loss=0.007742, over 15371.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08906, pruned_loss=0.0121, audio_tagging_loss=0.008466, over 3048954.11 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:40:53,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3583680.0, ans=0.0 2023-11-26 21:40:53,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3583680.0, ans=0.0 2023-11-26 21:41:00,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.05 vs. limit=15.0 2023-11-26 21:41:03,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3583746.6666666665, ans=0.125 2023-11-26 21:41:03,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3583746.6666666665, ans=0.2 2023-11-26 21:41:04,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.764e+01 9.533e+01 1.022e+02 1.336e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 21:41:11,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3583813.3333333335, ans=0.2 2023-11-26 21:41:27,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3583880.0, ans=0.0 2023-11-26 21:41:41,159 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537600 2023-11-26 21:41:45,586 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8550, loss[loss=0.06154, simple_loss=0.08137, pruned_loss=0.009998, audio_tagging_loss=0.01086, over 15202.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08897, pruned_loss=0.01197, audio_tagging_loss=0.008511, over 3048616.83 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:41:49,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3584013.3333333335, ans=0.1 2023-11-26 21:41:51,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-26 21:41:59,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3584080.0, ans=0.125 2023-11-26 21:42:02,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3584080.0, ans=0.125 2023-11-26 21:42:05,879 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-26 21:42:08,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3584146.6666666665, ans=0.1 2023-11-26 21:42:15,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3584146.6666666665, ans=0.1 2023-11-26 21:42:19,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3584213.3333333335, ans=0.07 2023-11-26 21:42:37,274 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537650 2023-11-26 21:42:41,431 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8600, loss[loss=0.07373, simple_loss=0.1053, pruned_loss=0.01421, audio_tagging_loss=0.006871, over 16105.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08879, pruned_loss=0.01204, audio_tagging_loss=0.008585, over 3044114.70 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:42:55,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.837e+01 9.386e+01 1.001e+02 1.418e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 21:42:57,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3584413.3333333335, ans=0.1 2023-11-26 21:43:20,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.95 vs. limit=10.0 2023-11-26 21:43:31,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3584613.3333333335, ans=0.0 2023-11-26 21:43:32,604 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537700 2023-11-26 21:43:36,831 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8650, loss[loss=0.07327, simple_loss=0.1012, pruned_loss=0.01402, audio_tagging_loss=0.008655, over 16726.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09031, pruned_loss=0.0124, audio_tagging_loss=0.008566, over 3040953.80 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:43:49,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2023-11-26 21:43:50,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3584746.6666666665, ans=0.09899494936611666 2023-11-26 21:44:05,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3584813.3333333335, ans=0.0 2023-11-26 21:44:09,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3584813.3333333335, ans=0.2 2023-11-26 21:44:11,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3584880.0, ans=0.125 2023-11-26 21:44:15,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-26 21:44:18,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2023-11-26 21:44:28,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537750 2023-11-26 21:44:33,340 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8700, loss[loss=0.07639, simple_loss=0.103, pruned_loss=0.01465, audio_tagging_loss=0.01023, over 15371.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09031, pruned_loss=0.01228, audio_tagging_loss=0.008632, over 3042950.57 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:44:49,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.138e+01 9.810e+01 1.049e+02 1.289e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-26 21:44:50,269 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:44:51,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3585080.0, ans=0.0 2023-11-26 21:45:05,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.48 vs. limit=10.0 2023-11-26 21:45:08,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3585213.3333333335, ans=0.09899494936611666 2023-11-26 21:45:24,811 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537800 2023-11-26 21:45:25,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3585280.0, ans=0.1 2023-11-26 21:45:29,262 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8750, loss[loss=0.05358, simple_loss=0.06922, pruned_loss=0.01018, audio_tagging_loss=0.008791, over 14735.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09065, pruned_loss=0.01255, audio_tagging_loss=0.008657, over 3048304.91 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:45:45,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3585413.3333333335, ans=0.125 2023-11-26 21:45:52,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3585480.0, ans=0.0 2023-11-26 21:45:59,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3585480.0, ans=0.0 2023-11-26 21:46:02,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3585546.6666666665, ans=0.125 2023-11-26 21:46:07,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.70 vs. limit=10.0 2023-11-26 21:46:20,589 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537850 2023-11-26 21:46:24,660 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8800, loss[loss=0.08507, simple_loss=0.116, pruned_loss=0.02035, audio_tagging_loss=0.006696, over 14367.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.092, pruned_loss=0.01269, audio_tagging_loss=0.008678, over 3046348.31 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:46:34,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3585746.6666666665, ans=0.125 2023-11-26 21:46:40,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.993e+01 9.548e+01 1.016e+02 1.284e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 21:46:54,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3585813.3333333335, ans=0.125 2023-11-26 21:47:10,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3585946.6666666665, ans=0.125 2023-11-26 21:47:15,753 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537900 2023-11-26 21:47:20,527 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8850, loss[loss=0.07289, simple_loss=0.1002, pruned_loss=0.01306, audio_tagging_loss=0.009717, over 15880.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09159, pruned_loss=0.01268, audio_tagging_loss=0.008758, over 3043112.96 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:47:26,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3586013.3333333335, ans=0.125 2023-11-26 21:47:33,206 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:47:38,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3586080.0, ans=0.1 2023-11-26 21:47:40,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-26 21:47:59,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3586213.3333333335, ans=0.07 2023-11-26 21:48:11,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=12.0 2023-11-26 21:48:12,665 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537950 2023-11-26 21:48:16,851 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8900, loss[loss=0.07977, simple_loss=0.1022, pruned_loss=0.02037, audio_tagging_loss=0.008302, over 16022.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09105, pruned_loss=0.01274, audio_tagging_loss=0.008702, over 3038838.23 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:48:18,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586346.6666666665, ans=0.1 2023-11-26 21:48:33,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.001e+01 9.576e+01 1.032e+02 1.288e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 21:48:38,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3586480.0, ans=0.0 2023-11-26 21:48:48,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3586480.0, ans=0.0 2023-11-26 21:48:50,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3586546.6666666665, ans=0.09899494936611666 2023-11-26 21:48:54,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3586546.6666666665, ans=0.125 2023-11-26 21:49:00,634 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:49:01,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586613.3333333335, ans=0.1 2023-11-26 21:49:07,895 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538000 2023-11-26 21:49:07,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3586613.3333333335, ans=0.125 2023-11-26 21:49:10,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3586613.3333333335, ans=0.125 2023-11-26 21:49:12,798 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8950, loss[loss=0.06632, simple_loss=0.08808, pruned_loss=0.01194, audio_tagging_loss=0.01034, over 15578.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09209, pruned_loss=0.01276, audio_tagging_loss=0.008474, over 3039017.19 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:49:13,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-26 21:49:39,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3586813.3333333335, ans=0.125 2023-11-26 21:49:41,365 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:49:51,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3586880.0, ans=0.125 2023-11-26 21:50:03,805 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538050 2023-11-26 21:50:08,019 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9000, loss[loss=0.08772, simple_loss=0.1291, pruned_loss=0.01588, audio_tagging_loss=0.007317, over 16320.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09155, pruned_loss=0.01261, audio_tagging_loss=0.008465, over 3036487.81 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:50:08,020 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 21:50:40,491 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05836, simple_loss=0.0505, pruned_loss=0.005274, audio_tagging_loss=0.02784, over 4681554.00 frames. 2023-11-26 21:50:40,492 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 21:50:43,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3587013.3333333335, ans=0.0 2023-11-26 21:50:47,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3587013.3333333335, ans=0.125 2023-11-26 21:50:51,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3587080.0, ans=0.0 2023-11-26 21:50:56,766 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.868e+01 9.363e+01 9.972e+01 1.329e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 21:51:07,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3587146.6666666665, ans=15.0 2023-11-26 21:51:08,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3587146.6666666665, ans=0.125 2023-11-26 21:51:26,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3587280.0, ans=0.0 2023-11-26 21:51:31,223 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538100 2023-11-26 21:51:32,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3587280.0, ans=0.07 2023-11-26 21:51:35,376 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9050, loss[loss=0.04989, simple_loss=0.06081, pruned_loss=0.01092, audio_tagging_loss=0.00856, over 14481.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09237, pruned_loss=0.01274, audio_tagging_loss=0.00837, over 3044126.25 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:51:37,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3587346.6666666665, ans=0.1 2023-11-26 21:51:59,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3587480.0, ans=0.0 2023-11-26 21:52:02,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3587480.0, ans=0.1 2023-11-26 21:52:26,849 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538150 2023-11-26 21:52:29,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3587613.3333333335, ans=0.0 2023-11-26 21:52:31,947 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9100, loss[loss=0.06853, simple_loss=0.09686, pruned_loss=0.01282, audio_tagging_loss=0.007276, over 15308.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09105, pruned_loss=0.01247, audio_tagging_loss=0.008382, over 3050077.35 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:52:40,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3587680.0, ans=0.125 2023-11-26 21:52:42,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3587746.6666666665, ans=0.1 2023-11-26 21:52:49,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.688e+01 9.524e+01 1.031e+02 1.397e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 21:53:23,839 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538200 2023-11-26 21:53:28,235 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9150, loss[loss=0.07273, simple_loss=0.1014, pruned_loss=0.01231, audio_tagging_loss=0.009708, over 16119.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09082, pruned_loss=0.01245, audio_tagging_loss=0.008396, over 3050661.82 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:53:43,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3588080.0, ans=0.125 2023-11-26 21:53:44,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3588080.0, ans=0.0 2023-11-26 21:53:51,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3588146.6666666665, ans=0.0 2023-11-26 21:54:16,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3588280.0, ans=0.0 2023-11-26 21:54:19,290 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538250 2023-11-26 21:54:19,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-26 21:54:23,512 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9200, loss[loss=0.09243, simple_loss=0.1256, pruned_loss=0.02306, audio_tagging_loss=0.006569, over 15511.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09027, pruned_loss=0.01233, audio_tagging_loss=0.008357, over 3046309.01 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:54:27,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3588346.6666666665, ans=0.2 2023-11-26 21:54:42,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.859e+01 9.629e+01 1.034e+02 1.503e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-26 21:55:12,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3588613.3333333335, ans=0.1 2023-11-26 21:55:15,021 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538300 2023-11-26 21:55:17,293 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:55:19,666 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9250, loss[loss=0.07378, simple_loss=0.1014, pruned_loss=0.01528, audio_tagging_loss=0.007784, over 16730.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08977, pruned_loss=0.01218, audio_tagging_loss=0.008406, over 3051206.30 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:55:20,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3588680.0, ans=0.125 2023-11-26 21:55:31,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=15.0 2023-11-26 21:55:44,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-26 21:55:53,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3588880.0, ans=0.125 2023-11-26 21:56:11,271 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538350 2023-11-26 21:56:15,963 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9300, loss[loss=0.05482, simple_loss=0.07243, pruned_loss=0.009435, audio_tagging_loss=0.009167, over 15807.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08988, pruned_loss=0.01204, audio_tagging_loss=0.008406, over 3048249.97 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:56:32,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.799e+01 9.431e+01 1.003e+02 1.401e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 21:56:34,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3589080.0, ans=0.125 2023-11-26 21:56:48,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-26 21:56:48,934 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2023-11-26 21:56:53,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=12.0 2023-11-26 21:57:01,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3589280.0, ans=0.1 2023-11-26 21:57:07,052 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538400 2023-11-26 21:57:07,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3589280.0, ans=0.125 2023-11-26 21:57:11,536 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9350, loss[loss=0.04217, simple_loss=0.04908, pruned_loss=0.007694, audio_tagging_loss=0.009934, over 13952.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08987, pruned_loss=0.01217, audio_tagging_loss=0.008435, over 3043611.17 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:57:15,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3589346.6666666665, ans=0.125 2023-11-26 21:57:22,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3589413.3333333335, ans=0.0 2023-11-26 21:57:55,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3589613.3333333335, ans=0.125 2023-11-26 21:58:01,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3589613.3333333335, ans=0.2 2023-11-26 21:58:02,234 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538450 2023-11-26 21:58:06,457 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9400, loss[loss=0.06518, simple_loss=0.08442, pruned_loss=0.01348, audio_tagging_loss=0.009483, over 14497.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09008, pruned_loss=0.01221, audio_tagging_loss=0.008556, over 3043922.51 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:58:06,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3589680.0, ans=0.125 2023-11-26 21:58:25,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 9.009e+01 9.595e+01 1.056e+02 1.388e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 21:58:25,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3589746.6666666665, ans=0.1 2023-11-26 21:58:29,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3589813.3333333335, ans=0.125 2023-11-26 21:58:46,998 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-26 21:58:51,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2023-11-26 21:58:58,875 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538500 2023-11-26 21:59:03,566 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9450, loss[loss=0.07773, simple_loss=0.1072, pruned_loss=0.0165, audio_tagging_loss=0.007619, over 14313.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09025, pruned_loss=0.01237, audio_tagging_loss=0.008674, over 3051848.71 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:59:03,599 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:59:05,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=12.0 2023-11-26 21:59:10,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.75 vs. limit=10.0 2023-11-26 21:59:18,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3590080.0, ans=0.125 2023-11-26 21:59:18,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3590080.0, ans=0.2 2023-11-26 21:59:30,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3590146.6666666665, ans=0.125 2023-11-26 21:59:41,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3590213.3333333335, ans=0.125 2023-11-26 21:59:45,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3590213.3333333335, ans=0.125 2023-11-26 21:59:55,081 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538550 2023-11-26 21:59:59,323 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9500, loss[loss=0.07864, simple_loss=0.1037, pruned_loss=0.01595, audio_tagging_loss=0.01082, over 14367.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09062, pruned_loss=0.01234, audio_tagging_loss=0.008675, over 3053237.43 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 22:00:18,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.889e+01 9.000e+01 9.693e+01 1.049e+02 2.337e+02, threshold=1.939e+02, percent-clipped=1.0 2023-11-26 22:00:29,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3590480.0, ans=10.0 2023-11-26 22:00:50,782 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538600 2023-11-26 22:00:52,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3590613.3333333335, ans=0.0 2023-11-26 22:00:55,218 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9550, loss[loss=0.05404, simple_loss=0.07112, pruned_loss=0.01034, audio_tagging_loss=0.008137, over 15897.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09069, pruned_loss=0.0124, audio_tagging_loss=0.008804, over 3051943.85 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 22:00:56,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3590680.0, ans=0.2 2023-11-26 22:01:01,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3590680.0, ans=0.125 2023-11-26 22:01:07,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3590746.6666666665, ans=0.125 2023-11-26 22:01:11,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3590746.6666666665, ans=0.125 2023-11-26 22:01:18,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3590813.3333333335, ans=0.5 2023-11-26 22:01:28,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3590880.0, ans=0.025 2023-11-26 22:01:45,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3590946.6666666665, ans=0.1 2023-11-26 22:01:46,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3590946.6666666665, ans=0.2 2023-11-26 22:01:47,405 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538650 2023-11-26 22:01:52,742 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9600, loss[loss=0.07945, simple_loss=0.1163, pruned_loss=0.01325, audio_tagging_loss=0.00805, over 15087.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09116, pruned_loss=0.01245, audio_tagging_loss=0.008756, over 3055083.63 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:01:56,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-26 22:02:10,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.846e+01 9.558e+01 1.014e+02 1.385e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 22:02:12,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-26 22:02:21,440 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:02:24,244 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:02:38,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3591280.0, ans=0.2 2023-11-26 22:02:39,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3591280.0, ans=0.0 2023-11-26 22:02:40,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3591280.0, ans=0.125 2023-11-26 22:02:43,706 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538700 2023-11-26 22:02:47,899 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9650, loss[loss=0.06363, simple_loss=0.08535, pruned_loss=0.01264, audio_tagging_loss=0.008307, over 14108.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09094, pruned_loss=0.01237, audio_tagging_loss=0.008785, over 3058823.16 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:02:52,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3591346.6666666665, ans=0.125 2023-11-26 22:03:02,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3591413.3333333335, ans=0.125 2023-11-26 22:03:21,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=22.5 2023-11-26 22:03:29,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3591546.6666666665, ans=0.2 2023-11-26 22:03:38,594 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538750 2023-11-26 22:03:39,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3591613.3333333335, ans=0.2 2023-11-26 22:03:42,848 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9700, loss[loss=0.04601, simple_loss=0.05485, pruned_loss=0.005855, audio_tagging_loss=0.01273, over 14567.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08947, pruned_loss=0.01212, audio_tagging_loss=0.008727, over 3045064.72 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:04:02,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.825e+01 9.473e+01 1.012e+02 1.378e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 22:04:15,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3591880.0, ans=0.125 2023-11-26 22:04:26,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3591946.6666666665, ans=0.125 2023-11-26 22:04:27,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3591946.6666666665, ans=0.125 2023-11-26 22:04:29,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3591946.6666666665, ans=0.125 2023-11-26 22:04:30,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3591946.6666666665, ans=0.2 2023-11-26 22:04:34,586 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538800 2023-11-26 22:04:34,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3591946.6666666665, ans=0.125 2023-11-26 22:04:39,044 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9750, loss[loss=0.06829, simple_loss=0.09319, pruned_loss=0.0154, audio_tagging_loss=0.006302, over 15672.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08947, pruned_loss=0.01212, audio_tagging_loss=0.008676, over 3040758.60 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:05:00,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3592146.6666666665, ans=0.125 2023-11-26 22:05:03,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3592146.6666666665, ans=0.2 2023-11-26 22:05:05,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3592146.6666666665, ans=0.125 2023-11-26 22:05:16,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3592213.3333333335, ans=0.0 2023-11-26 22:05:30,191 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538850 2023-11-26 22:05:34,339 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9800, loss[loss=0.07258, simple_loss=0.108, pruned_loss=0.01155, audio_tagging_loss=0.007035, over 16433.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09001, pruned_loss=0.01235, audio_tagging_loss=0.008514, over 3044472.66 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:05:43,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3592346.6666666665, ans=0.1 2023-11-26 22:05:52,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.733e+01 9.432e+01 1.005e+02 1.366e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 22:06:05,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3592480.0, ans=0.2 2023-11-26 22:06:08,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3592546.6666666665, ans=0.1 2023-11-26 22:06:15,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3592546.6666666665, ans=0.125 2023-11-26 22:06:25,735 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:06:25,790 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538900 2023-11-26 22:06:29,955 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9850, loss[loss=0.06406, simple_loss=0.09322, pruned_loss=0.01187, audio_tagging_loss=0.005579, over 15373.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08994, pruned_loss=0.01221, audio_tagging_loss=0.008465, over 3039583.70 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:06:45,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3592746.6666666665, ans=0.0 2023-11-26 22:07:08,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2023-11-26 22:07:18,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3592946.6666666665, ans=0.125 2023-11-26 22:07:21,925 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538950 2023-11-26 22:07:26,011 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9900, loss[loss=0.07482, simple_loss=0.1148, pruned_loss=0.01153, audio_tagging_loss=0.005876, over 16069.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09039, pruned_loss=0.01245, audio_tagging_loss=0.008462, over 3040635.25 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:07:33,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3593013.3333333335, ans=0.0 2023-11-26 22:07:33,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3593013.3333333335, ans=0.125 2023-11-26 22:07:34,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3593013.3333333335, ans=0.125 2023-11-26 22:07:42,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3593080.0, ans=0.2 2023-11-26 22:07:45,057 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 9.069e+01 9.666e+01 1.030e+02 3.243e+02, threshold=1.933e+02, percent-clipped=1.0 2023-11-26 22:07:50,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3593146.6666666665, ans=0.125 2023-11-26 22:07:56,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-26 22:08:16,898 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539000 2023-11-26 22:08:17,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=12.0 2023-11-26 22:08:21,879 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9950, loss[loss=0.05913, simple_loss=0.07489, pruned_loss=0.01109, audio_tagging_loss=0.0106, over 16916.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08969, pruned_loss=0.01232, audio_tagging_loss=0.008519, over 3039151.57 frames. ], batch size: 64, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:08:27,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3593346.6666666665, ans=0.125 2023-11-26 22:08:52,046 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-11-26 22:08:53,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3593546.6666666665, ans=0.125 2023-11-26 22:09:07,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2023-11-26 22:09:12,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3593613.3333333335, ans=0.0 2023-11-26 22:09:12,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539050 2023-11-26 22:09:12,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3593613.3333333335, ans=0.125 2023-11-26 22:09:17,139 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10000, loss[loss=0.05154, simple_loss=0.06679, pruned_loss=0.007642, audio_tagging_loss=0.0105, over 14833.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08857, pruned_loss=0.01205, audio_tagging_loss=0.008634, over 3040456.90 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:09:17,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3593680.0, ans=0.125 2023-11-26 22:09:30,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3593746.6666666665, ans=0.1 2023-11-26 22:09:35,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.750e+01 9.330e+01 1.017e+02 1.273e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 22:09:41,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3593813.3333333335, ans=0.125 2023-11-26 22:09:47,419 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-26 22:10:06,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3593946.6666666665, ans=0.0 2023-11-26 22:10:07,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539100 2023-11-26 22:10:12,289 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10050, loss[loss=0.06586, simple_loss=0.08738, pruned_loss=0.01219, audio_tagging_loss=0.009977, over 15136.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.0886, pruned_loss=0.01208, audio_tagging_loss=0.008582, over 3041360.94 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:10:16,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3594013.3333333335, ans=0.2 2023-11-26 22:10:23,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3594080.0, ans=0.025 2023-11-26 22:10:26,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3594080.0, ans=0.1 2023-11-26 22:10:38,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3594146.6666666665, ans=0.0 2023-11-26 22:10:51,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.89 vs. limit=12.0 2023-11-26 22:11:03,170 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539150 2023-11-26 22:11:07,342 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10100, loss[loss=0.05845, simple_loss=0.0807, pruned_loss=0.008312, audio_tagging_loss=0.009792, over 15358.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08881, pruned_loss=0.01214, audio_tagging_loss=0.008666, over 3044448.51 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:11:14,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-11-26 22:11:27,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 9.131e+01 9.595e+01 1.046e+02 1.257e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 22:11:28,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3594480.0, ans=0.1 2023-11-26 22:11:50,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3594613.3333333335, ans=0.125 2023-11-26 22:11:53,693 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:11:58,609 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539200 2023-11-26 22:12:03,080 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10150, loss[loss=0.08408, simple_loss=0.1128, pruned_loss=0.01985, audio_tagging_loss=0.007814, over 15148.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08913, pruned_loss=0.0122, audio_tagging_loss=0.008658, over 3052058.69 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:12:06,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3594680.0, ans=0.2 2023-11-26 22:12:13,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3594746.6666666665, ans=0.2 2023-11-26 22:12:15,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3594746.6666666665, ans=0.0 2023-11-26 22:12:19,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3594746.6666666665, ans=10.0 2023-11-26 22:12:30,752 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:12:50,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3594946.6666666665, ans=0.0 2023-11-26 22:12:53,682 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539250 2023-11-26 22:12:54,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3594946.6666666665, ans=0.125 2023-11-26 22:12:58,456 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10200, loss[loss=0.07132, simple_loss=0.1005, pruned_loss=0.0125, audio_tagging_loss=0.008551, over 16518.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09028, pruned_loss=0.01242, audio_tagging_loss=0.0087, over 3058356.30 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:13:08,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3595080.0, ans=0.2 2023-11-26 22:13:14,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3595080.0, ans=0.0 2023-11-26 22:13:18,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.041e+01 9.563e+01 1.048e+02 1.575e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 22:13:20,761 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:13:25,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-11-26 22:13:41,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-26 22:13:41,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2023-11-26 22:13:49,490 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539300 2023-11-26 22:13:54,242 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10250, loss[loss=0.05978, simple_loss=0.07878, pruned_loss=0.01074, audio_tagging_loss=0.009649, over 14710.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09095, pruned_loss=0.01254, audio_tagging_loss=0.008694, over 3048727.64 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:13:57,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-26 22:14:13,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3595413.3333333335, ans=0.0 2023-11-26 22:14:16,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3595480.0, ans=0.125 2023-11-26 22:14:39,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3595613.3333333335, ans=0.125 2023-11-26 22:14:41,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3595613.3333333335, ans=0.2 2023-11-26 22:14:43,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3595613.3333333335, ans=0.0 2023-11-26 22:14:43,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2023-11-26 22:14:45,468 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539350 2023-11-26 22:14:46,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3595613.3333333335, ans=0.125 2023-11-26 22:14:49,565 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10300, loss[loss=0.07071, simple_loss=0.1049, pruned_loss=0.01204, audio_tagging_loss=0.006205, over 16539.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09155, pruned_loss=0.01265, audio_tagging_loss=0.008636, over 3051017.45 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:15:08,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-26 22:15:10,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 9.184e+01 9.815e+01 1.071e+02 1.317e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-26 22:15:12,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3595813.3333333335, ans=0.1 2023-11-26 22:15:35,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3595946.6666666665, ans=0.125 2023-11-26 22:15:41,416 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539400 2023-11-26 22:15:45,996 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10350, loss[loss=0.06818, simple_loss=0.09557, pruned_loss=0.01062, audio_tagging_loss=0.009776, over 15791.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.0915, pruned_loss=0.01254, audio_tagging_loss=0.008726, over 3057502.11 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:15:50,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3596013.3333333335, ans=0.2 2023-11-26 22:15:55,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3596013.3333333335, ans=0.125 2023-11-26 22:16:20,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3596213.3333333335, ans=0.125 2023-11-26 22:16:26,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3596213.3333333335, ans=0.0 2023-11-26 22:16:38,452 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539450 2023-11-26 22:16:43,142 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10400, loss[loss=0.05695, simple_loss=0.07495, pruned_loss=0.008793, audio_tagging_loss=0.01068, over 15521.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09038, pruned_loss=0.01229, audio_tagging_loss=0.008863, over 3049937.88 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:17:02,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.954e+01 9.594e+01 1.032e+02 1.312e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 22:17:31,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3596613.3333333335, ans=0.0 2023-11-26 22:17:34,656 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539500 2023-11-26 22:17:35,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3596613.3333333335, ans=0.2 2023-11-26 22:17:38,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3596680.0, ans=0.125 2023-11-26 22:17:38,822 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10450, loss[loss=0.07644, simple_loss=0.1026, pruned_loss=0.01609, audio_tagging_loss=0.009051, over 15475.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09, pruned_loss=0.01234, audio_tagging_loss=0.008925, over 3042798.68 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:18:10,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3596813.3333333335, ans=0.2 2023-11-26 22:18:29,977 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539550 2023-11-26 22:18:34,744 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10500, loss[loss=0.06258, simple_loss=0.08166, pruned_loss=0.01427, audio_tagging_loss=0.007479, over 14422.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09011, pruned_loss=0.01225, audio_tagging_loss=0.008785, over 3053331.89 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:18:51,715 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.83 vs. limit=15.0 2023-11-26 22:18:53,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3597080.0, ans=0.125 2023-11-26 22:18:55,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.749e+01 9.296e+01 1.026e+02 1.262e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 22:18:55,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3597080.0, ans=0.125 2023-11-26 22:19:01,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3597146.6666666665, ans=0.1 2023-11-26 22:19:21,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3597280.0, ans=0.125 2023-11-26 22:19:26,535 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539600 2023-11-26 22:19:30,990 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10550, loss[loss=0.06218, simple_loss=0.08763, pruned_loss=0.009287, audio_tagging_loss=0.009079, over 16212.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08912, pruned_loss=0.01201, audio_tagging_loss=0.008713, over 3057303.11 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:19:38,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3597346.6666666665, ans=0.0 2023-11-26 22:19:41,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3597413.3333333335, ans=0.025 2023-11-26 22:19:42,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3597413.3333333335, ans=0.125 2023-11-26 22:19:43,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3597413.3333333335, ans=0.0 2023-11-26 22:19:44,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3597413.3333333335, ans=0.125 2023-11-26 22:19:47,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3597413.3333333335, ans=0.0 2023-11-26 22:19:56,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3597480.0, ans=0.125 2023-11-26 22:20:20,588 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:20:22,561 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539650 2023-11-26 22:20:22,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3597613.3333333335, ans=0.0 2023-11-26 22:20:26,734 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10600, loss[loss=0.07629, simple_loss=0.106, pruned_loss=0.01571, audio_tagging_loss=0.00757, over 15590.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08837, pruned_loss=0.01188, audio_tagging_loss=0.008684, over 3056686.76 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:20:28,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3597680.0, ans=0.125 2023-11-26 22:20:47,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.604e+01 9.125e+01 9.885e+01 1.207e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-26 22:20:51,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3597813.3333333335, ans=0.125 2023-11-26 22:20:54,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3597813.3333333335, ans=0.125 2023-11-26 22:21:17,950 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539700 2023-11-26 22:21:22,194 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10650, loss[loss=0.05832, simple_loss=0.07974, pruned_loss=0.008615, audio_tagging_loss=0.009832, over 14626.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08832, pruned_loss=0.0119, audio_tagging_loss=0.008749, over 3049225.14 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:21:26,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3598013.3333333335, ans=0.0 2023-11-26 22:21:55,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3598213.3333333335, ans=0.125 2023-11-26 22:22:00,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3598213.3333333335, ans=0.125 2023-11-26 22:22:14,166 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539750 2023-11-26 22:22:18,301 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10700, loss[loss=0.04723, simple_loss=0.06247, pruned_loss=0.008274, audio_tagging_loss=0.007719, over 15364.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08946, pruned_loss=0.01203, audio_tagging_loss=0.008685, over 3046503.21 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:22:29,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3598413.3333333335, ans=0.2 2023-11-26 22:22:35,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3598413.3333333335, ans=0.125 2023-11-26 22:22:37,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.850e+01 9.452e+01 1.010e+02 1.228e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 22:22:43,824 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.93 vs. limit=10.0 2023-11-26 22:22:44,766 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-26 22:22:50,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3598546.6666666665, ans=15.0 2023-11-26 22:23:09,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-26 22:23:09,826 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539800 2023-11-26 22:23:14,293 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10750, loss[loss=0.06932, simple_loss=0.1035, pruned_loss=0.01148, audio_tagging_loss=0.006078, over 15755.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08959, pruned_loss=0.01203, audio_tagging_loss=0.00871, over 3038246.35 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:23:16,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3598680.0, ans=0.0 2023-11-26 22:23:27,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3598746.6666666665, ans=0.125 2023-11-26 22:23:52,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3598880.0, ans=0.2 2023-11-26 22:23:59,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 22:24:02,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3598946.6666666665, ans=0.1 2023-11-26 22:24:04,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3598946.6666666665, ans=0.1 2023-11-26 22:24:04,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3598946.6666666665, ans=0.125 2023-11-26 22:24:05,266 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539850 2023-11-26 22:24:09,425 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10800, loss[loss=0.06151, simple_loss=0.08609, pruned_loss=0.009926, audio_tagging_loss=0.008536, over 14469.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08877, pruned_loss=0.01185, audio_tagging_loss=0.008636, over 3038896.03 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:24:14,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3599013.3333333335, ans=0.125 2023-11-26 22:24:31,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.827e+01 9.312e+01 1.017e+02 1.289e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 22:24:44,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3599213.3333333335, ans=0.07 2023-11-26 22:24:47,748 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:25:00,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3599280.0, ans=0.1 2023-11-26 22:25:00,997 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539900 2023-11-26 22:25:01,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3599280.0, ans=0.125 2023-11-26 22:25:06,352 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10850, loss[loss=0.03799, simple_loss=0.046, pruned_loss=0.00462, audio_tagging_loss=0.01037, over 15926.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08832, pruned_loss=0.01183, audio_tagging_loss=0.008634, over 3044363.26 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:25:07,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3599346.6666666665, ans=0.125 2023-11-26 22:25:11,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-26 22:25:17,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3599413.3333333335, ans=0.1 2023-11-26 22:25:48,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.97 vs. limit=10.0 2023-11-26 22:25:56,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=22.5 2023-11-26 22:25:57,959 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539950 2023-11-26 22:25:58,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3599613.3333333335, ans=0.125 2023-11-26 22:26:00,072 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:26:02,139 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10900, loss[loss=0.07716, simple_loss=0.1146, pruned_loss=0.01166, audio_tagging_loss=0.008179, over 15775.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08862, pruned_loss=0.01182, audio_tagging_loss=0.008664, over 3040196.44 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:26:02,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3599680.0, ans=0.2 2023-11-26 22:26:19,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3599746.6666666665, ans=0.1 2023-11-26 22:26:22,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3599813.3333333335, ans=0.0 2023-11-26 22:26:23,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 9.085e+01 9.626e+01 1.024e+02 1.281e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 22:26:32,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3599813.3333333335, ans=0.95 2023-11-26 22:26:33,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3599813.3333333335, ans=0.05 2023-11-26 22:26:53,220 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540000 2023-11-26 22:26:53,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3599946.6666666665, ans=0.125 2023-11-26 22:26:59,551 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10950, loss[loss=0.0699, simple_loss=0.1028, pruned_loss=0.01176, audio_tagging_loss=0.00675, over 14885.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08911, pruned_loss=0.012, audio_tagging_loss=0.008636, over 3043558.60 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:27:00,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=22.5 2023-11-26 22:27:01,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3600013.3333333335, ans=0.0 2023-11-26 22:27:22,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3600146.6666666665, ans=0.0 2023-11-26 22:27:24,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3600146.6666666665, ans=0.0 2023-11-26 22:27:24,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3600146.6666666665, ans=0.2 2023-11-26 22:27:28,297 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:27:33,913 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-11-26 22:27:37,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3600213.3333333335, ans=0.2 2023-11-26 22:27:46,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3600280.0, ans=0.1 2023-11-26 22:27:50,445 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540050 2023-11-26 22:27:55,670 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11000, loss[loss=0.06015, simple_loss=0.08851, pruned_loss=0.00833, audio_tagging_loss=0.007564, over 15104.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08894, pruned_loss=0.01186, audio_tagging_loss=0.008628, over 3051832.88 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:27:55,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3600346.6666666665, ans=0.2 2023-11-26 22:27:58,160 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2023-11-26 22:27:58,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2023-11-26 22:28:07,299 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:28:09,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3600413.3333333335, ans=0.025 2023-11-26 22:28:17,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.990e+01 9.480e+01 9.957e+01 3.729e+02, threshold=1.896e+02, percent-clipped=1.0 2023-11-26 22:28:29,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3600546.6666666665, ans=0.1 2023-11-26 22:28:47,298 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540100 2023-11-26 22:28:52,033 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11050, loss[loss=0.07909, simple_loss=0.1166, pruned_loss=0.0139, audio_tagging_loss=0.006886, over 15028.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.0892, pruned_loss=0.01195, audio_tagging_loss=0.008661, over 3046025.98 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:29:38,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3600946.6666666665, ans=0.125 2023-11-26 22:29:42,420 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540150 2023-11-26 22:29:45,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3601013.3333333335, ans=0.2 2023-11-26 22:29:46,561 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11100, loss[loss=0.05889, simple_loss=0.08058, pruned_loss=0.01019, audio_tagging_loss=0.00841, over 15573.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08945, pruned_loss=0.01186, audio_tagging_loss=0.008793, over 3049566.38 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:29:58,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3601080.0, ans=0.0 2023-11-26 22:30:08,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.991e+01 9.032e+01 9.689e+01 1.034e+02 1.564e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 22:30:23,199 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2023-11-26 22:30:24,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3601213.3333333335, ans=0.1 2023-11-26 22:30:32,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3601280.0, ans=0.125 2023-11-26 22:30:33,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3601280.0, ans=0.125 2023-11-26 22:30:37,327 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540200 2023-11-26 22:30:37,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3601280.0, ans=0.2 2023-11-26 22:30:42,463 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11150, loss[loss=0.04325, simple_loss=0.04919, pruned_loss=0.007282, audio_tagging_loss=0.01138, over 14829.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08963, pruned_loss=0.01197, audio_tagging_loss=0.008936, over 3047282.28 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:30:45,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2023-11-26 22:30:51,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3601346.6666666665, ans=0.0 2023-11-26 22:31:21,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3601546.6666666665, ans=0.1 2023-11-26 22:31:33,865 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540250 2023-11-26 22:31:38,610 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11200, loss[loss=0.08062, simple_loss=0.1113, pruned_loss=0.01924, audio_tagging_loss=0.005713, over 16267.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09059, pruned_loss=0.01211, audio_tagging_loss=0.008978, over 3052757.61 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:31:58,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3601746.6666666665, ans=0.0 2023-11-26 22:32:00,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2023-11-26 22:32:01,319 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.768e+01 9.515e+01 1.029e+02 1.320e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 22:32:16,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3601880.0, ans=0.125 2023-11-26 22:32:30,213 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540300 2023-11-26 22:32:34,392 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11250, loss[loss=0.08606, simple_loss=0.1259, pruned_loss=0.01672, audio_tagging_loss=0.006391, over 16344.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08979, pruned_loss=0.01195, audio_tagging_loss=0.008977, over 3058577.18 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:32:35,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3602013.3333333335, ans=0.125 2023-11-26 22:33:08,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3602213.3333333335, ans=0.125 2023-11-26 22:33:10,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3602213.3333333335, ans=0.0 2023-11-26 22:33:14,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-11-26 22:33:22,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-26 22:33:25,521 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540350 2023-11-26 22:33:26,224 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=22.5 2023-11-26 22:33:29,737 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11300, loss[loss=0.0857, simple_loss=0.1229, pruned_loss=0.01825, audio_tagging_loss=0.005972, over 15551.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08968, pruned_loss=0.01199, audio_tagging_loss=0.008851, over 3053270.08 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:33:36,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2023-11-26 22:33:45,851 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-26 22:33:54,062 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.683e+01 9.336e+01 1.007e+02 1.340e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 22:34:16,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3602613.3333333335, ans=0.0 2023-11-26 22:34:21,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540400 2023-11-26 22:34:26,372 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11350, loss[loss=0.06321, simple_loss=0.09253, pruned_loss=0.009806, audio_tagging_loss=0.00714, over 15203.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08969, pruned_loss=0.01199, audio_tagging_loss=0.008701, over 3047604.58 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:34:32,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3602680.0, ans=0.1 2023-11-26 22:34:53,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3602813.3333333335, ans=0.05 2023-11-26 22:34:54,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3602813.3333333335, ans=0.0 2023-11-26 22:34:55,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-11-26 22:34:57,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3602813.3333333335, ans=0.5 2023-11-26 22:35:07,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3602880.0, ans=0.07 2023-11-26 22:35:17,901 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540450 2023-11-26 22:35:22,600 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11400, loss[loss=0.05353, simple_loss=0.06923, pruned_loss=0.007076, audio_tagging_loss=0.01184, over 15022.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09019, pruned_loss=0.01203, audio_tagging_loss=0.008587, over 3042645.65 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:35:37,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3603080.0, ans=0.125 2023-11-26 22:35:38,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-26 22:35:46,001 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 8.997e+01 9.516e+01 1.035e+02 1.684e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 22:35:49,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-26 22:35:59,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3603213.3333333335, ans=15.0 2023-11-26 22:36:13,498 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540500 2023-11-26 22:36:17,736 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11450, loss[loss=0.06309, simple_loss=0.08887, pruned_loss=0.01196, audio_tagging_loss=0.006696, over 15661.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09033, pruned_loss=0.01202, audio_tagging_loss=0.008513, over 3039666.88 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:36:31,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3603413.3333333335, ans=10.0 2023-11-26 22:36:39,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3603480.0, ans=0.0 2023-11-26 22:36:43,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3603480.0, ans=0.1 2023-11-26 22:36:49,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3603480.0, ans=0.125 2023-11-26 22:37:09,942 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540550 2023-11-26 22:37:14,186 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11500, loss[loss=0.06855, simple_loss=0.09192, pruned_loss=0.01471, audio_tagging_loss=0.007878, over 15502.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08944, pruned_loss=0.01199, audio_tagging_loss=0.008538, over 3042800.51 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:37:16,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3603680.0, ans=0.125 2023-11-26 22:37:22,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3603680.0, ans=0.0 2023-11-26 22:37:23,874 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2023-11-26 22:37:29,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-26 22:37:37,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.979e+01 9.575e+01 1.038e+02 1.869e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 22:37:40,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.77 vs. limit=15.0 2023-11-26 22:38:05,439 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540600 2023-11-26 22:38:09,900 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11550, loss[loss=0.08469, simple_loss=0.1109, pruned_loss=0.02059, audio_tagging_loss=0.008638, over 14917.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08915, pruned_loss=0.01204, audio_tagging_loss=0.008563, over 3049975.80 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:38:43,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3604213.3333333335, ans=15.0 2023-11-26 22:38:46,076 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:38:54,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3604280.0, ans=0.125 2023-11-26 22:39:01,605 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540650 2023-11-26 22:39:05,791 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11600, loss[loss=0.05971, simple_loss=0.08203, pruned_loss=0.008064, audio_tagging_loss=0.01063, over 14322.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08866, pruned_loss=0.01197, audio_tagging_loss=0.008581, over 3052609.01 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:39:30,777 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.938e+01 9.802e+01 1.035e+02 1.553e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-26 22:39:45,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3604546.6666666665, ans=0.125 2023-11-26 22:39:57,439 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540700 2023-11-26 22:40:02,168 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11650, loss[loss=0.0974, simple_loss=0.1375, pruned_loss=0.02101, audio_tagging_loss=0.007649, over 15091.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08973, pruned_loss=0.01221, audio_tagging_loss=0.008624, over 3050746.08 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:40:03,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3604680.0, ans=0.1 2023-11-26 22:40:04,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3604680.0, ans=0.1 2023-11-26 22:40:19,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3604746.6666666665, ans=0.125 2023-11-26 22:40:26,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3604813.3333333335, ans=0.125 2023-11-26 22:40:30,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3604813.3333333335, ans=0.1 2023-11-26 22:40:35,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3604880.0, ans=0.125 2023-11-26 22:40:48,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.80 vs. limit=10.0 2023-11-26 22:40:51,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3604946.6666666665, ans=0.0 2023-11-26 22:40:53,888 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540750 2023-11-26 22:40:55,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3604946.6666666665, ans=0.0 2023-11-26 22:40:58,075 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11700, loss[loss=0.08368, simple_loss=0.1127, pruned_loss=0.02012, audio_tagging_loss=0.007202, over 14106.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08976, pruned_loss=0.01224, audio_tagging_loss=0.008645, over 3049802.17 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:41:00,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3605013.3333333335, ans=0.125 2023-11-26 22:41:01,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3605013.3333333335, ans=0.1 2023-11-26 22:41:17,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3605080.0, ans=0.0 2023-11-26 22:41:23,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.888e+01 9.584e+01 1.031e+02 1.555e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 22:41:25,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3605146.6666666665, ans=0.07 2023-11-26 22:41:26,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3605146.6666666665, ans=0.125 2023-11-26 22:41:31,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3605213.3333333335, ans=0.1 2023-11-26 22:41:49,352 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540800 2023-11-26 22:41:54,390 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11750, loss[loss=0.05858, simple_loss=0.07523, pruned_loss=0.01188, audio_tagging_loss=0.009087, over 15336.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09043, pruned_loss=0.01242, audio_tagging_loss=0.008662, over 3047639.32 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:42:07,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3605413.3333333335, ans=0.2 2023-11-26 22:42:24,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3605480.0, ans=0.125 2023-11-26 22:42:30,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3605546.6666666665, ans=0.2 2023-11-26 22:42:30,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3605546.6666666665, ans=0.0 2023-11-26 22:42:45,766 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540850 2023-11-26 22:42:50,422 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11800, loss[loss=0.06914, simple_loss=0.09599, pruned_loss=0.01422, audio_tagging_loss=0.006917, over 14706.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08979, pruned_loss=0.01235, audio_tagging_loss=0.008699, over 3044956.89 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:43:06,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3605746.6666666665, ans=0.0 2023-11-26 22:43:11,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3605813.3333333335, ans=0.0 2023-11-26 22:43:13,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3605813.3333333335, ans=0.025 2023-11-26 22:43:14,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.659e+01 9.498e+01 1.042e+02 1.310e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 22:43:42,109 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540900 2023-11-26 22:43:46,313 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11850, loss[loss=0.06346, simple_loss=0.08558, pruned_loss=0.009857, audio_tagging_loss=0.01082, over 15729.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09111, pruned_loss=0.01264, audio_tagging_loss=0.008709, over 3044131.50 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:43:54,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3606013.3333333335, ans=0.0 2023-11-26 22:43:57,462 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2023-11-26 22:44:23,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-26 22:44:24,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-26 22:44:24,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3606213.3333333335, ans=0.2 2023-11-26 22:44:29,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-26 22:44:35,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3606280.0, ans=0.125 2023-11-26 22:44:37,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540950 2023-11-26 22:44:41,650 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11900, loss[loss=0.07682, simple_loss=0.1087, pruned_loss=0.01537, audio_tagging_loss=0.007105, over 16158.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09096, pruned_loss=0.01246, audio_tagging_loss=0.008775, over 3045796.35 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:44:42,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=22.5 2023-11-26 22:44:44,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2023-11-26 22:44:58,100 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:45:04,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-11-26 22:45:07,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.979e+01 9.678e+01 1.018e+02 1.926e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-26 22:45:17,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3606546.6666666665, ans=0.05 2023-11-26 22:45:22,415 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-26 22:45:26,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3606613.3333333335, ans=0.125 2023-11-26 22:45:28,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3606613.3333333335, ans=0.1 2023-11-26 22:45:33,283 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541000 2023-11-26 22:45:38,254 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11950, loss[loss=0.07331, simple_loss=0.1031, pruned_loss=0.01382, audio_tagging_loss=0.007922, over 14706.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09018, pruned_loss=0.01236, audio_tagging_loss=0.008935, over 3041688.01 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:45:51,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3606746.6666666665, ans=0.125 2023-11-26 22:46:14,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3606880.0, ans=0.125 2023-11-26 22:46:19,205 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=12.0 2023-11-26 22:46:27,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3606946.6666666665, ans=0.1 2023-11-26 22:46:27,961 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541050 2023-11-26 22:46:31,994 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 12000, loss[loss=0.05103, simple_loss=0.0748, pruned_loss=0.005121, audio_tagging_loss=0.008512, over 15693.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08983, pruned_loss=0.01236, audio_tagging_loss=0.00896, over 3042093.23 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:46:31,995 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 22:47:04,442 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05747, simple_loss=0.05048, pruned_loss=0.005268, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-26 22:47:04,443 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 22:47:08,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3607013.3333333335, ans=0.125 2023-11-26 22:47:14,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3607080.0, ans=0.2 2023-11-26 22:47:15,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3607080.0, ans=0.2 2023-11-26 22:47:17,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3607080.0, ans=0.0 2023-11-26 22:47:18,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3607080.0, ans=0.125 2023-11-26 22:47:26,928 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.102e+01 9.829e+01 1.057e+02 1.323e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-26 22:47:57,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-26 22:47:58,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2023-11-26 22:48:02,328 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 0, loss[loss=0.06794, simple_loss=0.06831, pruned_loss=0.009865, audio_tagging_loss=0.02392, over 15313.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.06831, pruned_loss=0.009865, audio_tagging_loss=0.02392, over 15313.00 frames. ], batch size: 58, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:48:02,329 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 22:48:13,039 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.4696, 6.1083, 6.4127, 5.9183], device='cuda:2') 2023-11-26 22:48:13,988 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6583, 3.0951, 3.2279, 2.7249, 3.4485, 3.4588, 3.4575, 3.4205], device='cuda:2') 2023-11-26 22:48:33,849 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05779, simple_loss=0.05056, pruned_loss=0.005325, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-26 22:48:33,850 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 22:48:34,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3607186.6666666665, ans=0.125 2023-11-26 22:48:36,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3607186.6666666665, ans=0.2 2023-11-26 22:48:55,502 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541100 2023-11-26 22:48:55,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3607320.0, ans=0.125 2023-11-26 22:49:06,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3607386.6666666665, ans=0.04949747468305833 2023-11-26 22:49:06,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=15.0 2023-11-26 22:49:22,022 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=12.0 2023-11-26 22:49:27,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2023-11-26 22:49:28,968 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 50, loss[loss=0.08108, simple_loss=0.1007, pruned_loss=0.01546, audio_tagging_loss=0.01525, over 14594.00 frames. ], tot_loss[loss=0.07166, simple_loss=0.08504, pruned_loss=0.01185, audio_tagging_loss=0.01729, over 680352.58 frames. ], batch size: 54, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:49:50,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541150 2023-11-26 22:50:01,593 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:50:03,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3607720.0, ans=0.09899494936611666 2023-11-26 22:50:04,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-26 22:50:13,351 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.55 vs. limit=10.0 2023-11-26 22:50:20,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.136e+01 9.821e+01 1.049e+02 1.148e+02 1.594e+02, threshold=2.098e+02, percent-clipped=0.0 2023-11-26 22:50:24,341 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 100, loss[loss=0.05064, simple_loss=0.05385, pruned_loss=0.005349, audio_tagging_loss=0.01837, over 14191.00 frames. ], tot_loss[loss=0.07148, simple_loss=0.08616, pruned_loss=0.01181, audio_tagging_loss=0.01659, over 1202929.26 frames. ], batch size: 56, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:50:47,202 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541200 2023-11-26 22:50:49,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3607986.6666666665, ans=0.0 2023-11-26 22:50:54,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3607986.6666666665, ans=0.0 2023-11-26 22:51:00,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-26 22:51:10,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3608120.0, ans=0.125 2023-11-26 22:51:16,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3608120.0, ans=0.125 2023-11-26 22:51:18,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=22.5 2023-11-26 22:51:20,244 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 150, loss[loss=0.06389, simple_loss=0.08435, pruned_loss=0.008225, audio_tagging_loss=0.01349, over 14938.00 frames. ], tot_loss[loss=0.06992, simple_loss=0.08687, pruned_loss=0.01177, audio_tagging_loss=0.01472, over 1609429.52 frames. ], batch size: 56, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:51:29,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3608186.6666666665, ans=0.0 2023-11-26 22:51:33,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3608253.3333333335, ans=0.0 2023-11-26 22:51:37,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608253.3333333335, ans=0.1 2023-11-26 22:51:43,337 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541250 2023-11-26 22:51:44,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2023-11-26 22:51:54,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-26 22:51:55,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-26 22:52:02,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608386.6666666665, ans=0.1 2023-11-26 22:52:04,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-26 22:52:11,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608453.3333333335, ans=0.1 2023-11-26 22:52:12,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.648e+01 9.369e+01 9.812e+01 1.037e+02 1.267e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-26 22:52:13,323 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2023-11-26 22:52:13,958 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:52:14,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608453.3333333335, ans=0.1 2023-11-26 22:52:16,844 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 200, loss[loss=0.05606, simple_loss=0.07183, pruned_loss=0.01098, audio_tagging_loss=0.009167, over 14352.00 frames. ], tot_loss[loss=0.06915, simple_loss=0.08828, pruned_loss=0.01201, audio_tagging_loss=0.01301, over 1924970.89 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:52:38,593 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541300 2023-11-26 22:52:50,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-26 22:52:51,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3608720.0, ans=0.125 2023-11-26 22:52:58,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3608720.0, ans=0.0 2023-11-26 22:53:10,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608786.6666666665, ans=0.1 2023-11-26 22:53:11,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3608853.3333333335, ans=0.125 2023-11-26 22:53:12,701 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 250, loss[loss=0.08754, simple_loss=0.1224, pruned_loss=0.01809, audio_tagging_loss=0.008251, over 15488.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.0912, pruned_loss=0.01237, audio_tagging_loss=0.01162, over 2186652.31 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:53:35,455 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541350 2023-11-26 22:53:40,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.63 vs. limit=10.0 2023-11-26 22:53:44,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3608986.6666666665, ans=0.0 2023-11-26 22:54:05,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 9.060e+01 9.556e+01 1.038e+02 1.375e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 22:54:09,082 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 300, loss[loss=0.07003, simple_loss=0.1044, pruned_loss=0.01238, audio_tagging_loss=0.005445, over 14520.00 frames. ], tot_loss[loss=0.06883, simple_loss=0.09146, pruned_loss=0.01245, audio_tagging_loss=0.01065, over 2374211.08 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:54:11,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3609186.6666666665, ans=0.07 2023-11-26 22:54:18,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3609186.6666666665, ans=0.125 2023-11-26 22:54:19,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3609253.3333333335, ans=0.125 2023-11-26 22:54:23,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.15 vs. limit=10.0 2023-11-26 22:54:32,049 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541400 2023-11-26 22:54:48,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3609386.6666666665, ans=0.2 2023-11-26 22:54:51,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-26 22:55:04,959 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 350, loss[loss=0.04643, simple_loss=0.05882, pruned_loss=0.007898, audio_tagging_loss=0.009127, over 14488.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09083, pruned_loss=0.01249, audio_tagging_loss=0.01014, over 2521636.49 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:55:11,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3609520.0, ans=0.1 2023-11-26 22:55:14,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3609520.0, ans=0.2 2023-11-26 22:55:18,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3609586.6666666665, ans=0.125 2023-11-26 22:55:27,480 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541450 2023-11-26 22:55:57,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.928e+01 9.545e+01 1.017e+02 1.635e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 22:56:01,250 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 400, loss[loss=0.08479, simple_loss=0.1261, pruned_loss=0.01483, audio_tagging_loss=0.006935, over 16734.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09123, pruned_loss=0.01248, audio_tagging_loss=0.009639, over 2639802.55 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:56:04,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3609853.3333333335, ans=0.125 2023-11-26 22:56:07,032 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-11-26 22:56:20,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3609920.0, ans=0.125 2023-11-26 22:56:23,725 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541500 2023-11-26 22:56:30,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3609986.6666666665, ans=0.05 2023-11-26 22:56:50,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3610120.0, ans=0.125 2023-11-26 22:56:56,553 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 450, loss[loss=0.07772, simple_loss=0.1088, pruned_loss=0.01631, audio_tagging_loss=0.007015, over 15492.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09066, pruned_loss=0.01237, audio_tagging_loss=0.009411, over 2736381.17 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:57:18,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3610320.0, ans=0.1 2023-11-26 22:57:20,018 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541550 2023-11-26 22:57:26,441 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:57:49,769 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 9.008e+01 9.621e+01 1.046e+02 1.513e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 22:57:53,100 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 500, loss[loss=0.06466, simple_loss=0.08874, pruned_loss=0.0117, audio_tagging_loss=0.008595, over 14885.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09026, pruned_loss=0.01233, audio_tagging_loss=0.009255, over 2809960.85 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:58:01,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3610520.0, ans=0.0 2023-11-26 22:58:15,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541600 2023-11-26 22:58:23,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-26 22:58:42,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3610786.6666666665, ans=0.125 2023-11-26 22:58:48,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3610853.3333333335, ans=0.125 2023-11-26 22:58:49,537 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 550, loss[loss=0.05732, simple_loss=0.07207, pruned_loss=0.01041, audio_tagging_loss=0.01087, over 15322.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08994, pruned_loss=0.01219, audio_tagging_loss=0.009198, over 2861831.26 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:58:52,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3610853.3333333335, ans=0.1 2023-11-26 22:58:57,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2023-11-26 22:58:59,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-26 22:59:11,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541650 2023-11-26 22:59:23,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-26 22:59:24,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-26 22:59:38,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3611120.0, ans=0.125 2023-11-26 22:59:42,592 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.846e+01 9.307e+01 1.018e+02 1.266e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 22:59:44,775 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 600, loss[loss=0.07029, simple_loss=0.09917, pruned_loss=0.01324, audio_tagging_loss=0.007468, over 15881.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09016, pruned_loss=0.01222, audio_tagging_loss=0.009045, over 2898415.84 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:59:48,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3611186.6666666665, ans=0.0 2023-11-26 22:59:59,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3611253.3333333335, ans=0.2 2023-11-26 23:00:00,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3611253.3333333335, ans=0.2 2023-11-26 23:00:00,450 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=15.0 2023-11-26 23:00:03,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3611253.3333333335, ans=0.0 2023-11-26 23:00:07,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541700 2023-11-26 23:00:08,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3611320.0, ans=0.09899494936611666 2023-11-26 23:00:16,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-26 23:00:16,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3611320.0, ans=0.0 2023-11-26 23:00:22,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3611386.6666666665, ans=0.0 2023-11-26 23:00:30,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3611453.3333333335, ans=0.125 2023-11-26 23:00:41,275 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 650, loss[loss=0.078, simple_loss=0.1151, pruned_loss=0.01527, audio_tagging_loss=0.00519, over 14358.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08906, pruned_loss=0.01204, audio_tagging_loss=0.008941, over 2930887.70 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:00:44,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3611520.0, ans=10.0 2023-11-26 23:01:03,891 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541750 2023-11-26 23:01:14,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3611720.0, ans=0.0 2023-11-26 23:01:18,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3611720.0, ans=0.2 2023-11-26 23:01:27,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-26 23:01:31,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3611786.6666666665, ans=0.125 2023-11-26 23:01:35,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.767e+01 9.555e+01 1.054e+02 1.204e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 23:01:36,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3611853.3333333335, ans=0.125 2023-11-26 23:01:37,478 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 700, loss[loss=0.09118, simple_loss=0.1173, pruned_loss=0.02414, audio_tagging_loss=0.008404, over 14334.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08966, pruned_loss=0.0122, audio_tagging_loss=0.008872, over 2960012.75 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:01:39,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3611853.3333333335, ans=0.0 2023-11-26 23:01:45,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3611853.3333333335, ans=0.1 2023-11-26 23:01:56,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3611920.0, ans=0.035 2023-11-26 23:01:59,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541800 2023-11-26 23:02:08,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3611986.6666666665, ans=0.125 2023-11-26 23:02:29,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3612120.0, ans=0.125 2023-11-26 23:02:31,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2023-11-26 23:02:33,559 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 750, loss[loss=0.05714, simple_loss=0.07655, pruned_loss=0.01055, audio_tagging_loss=0.008321, over 15568.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08982, pruned_loss=0.01212, audio_tagging_loss=0.008905, over 2978020.38 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:02:55,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541850 2023-11-26 23:03:04,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3612320.0, ans=0.125 2023-11-26 23:03:11,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3612386.6666666665, ans=0.125 2023-11-26 23:03:22,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3612453.3333333335, ans=0.0 2023-11-26 23:03:27,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.015e+01 9.591e+01 1.028e+02 1.389e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 23:03:29,962 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 800, loss[loss=0.07731, simple_loss=0.1087, pruned_loss=0.01679, audio_tagging_loss=0.00618, over 16118.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09013, pruned_loss=0.01237, audio_tagging_loss=0.0089, over 2999694.93 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:03:32,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3612520.0, ans=0.125 2023-11-26 23:03:52,074 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541900 2023-11-26 23:03:54,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2023-11-26 23:03:56,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3612653.3333333335, ans=0.125 2023-11-26 23:03:58,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3612653.3333333335, ans=0.125 2023-11-26 23:04:02,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612720.0, ans=0.1 2023-11-26 23:04:03,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3612720.0, ans=0.0 2023-11-26 23:04:07,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3612720.0, ans=0.0 2023-11-26 23:04:09,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3612720.0, ans=0.125 2023-11-26 23:04:11,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3612720.0, ans=0.125 2023-11-26 23:04:12,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2023-11-26 23:04:25,723 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 850, loss[loss=0.07288, simple_loss=0.1031, pruned_loss=0.01246, audio_tagging_loss=0.008884, over 15587.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0897, pruned_loss=0.01221, audio_tagging_loss=0.00897, over 3008594.78 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:04:48,332 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541950 2023-11-26 23:04:55,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-11-26 23:05:03,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3613053.3333333335, ans=0.2 2023-11-26 23:05:06,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3613053.3333333335, ans=0.2 2023-11-26 23:05:10,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2023-11-26 23:05:19,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.857e+01 9.463e+01 1.019e+02 1.516e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 23:05:21,976 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 900, loss[loss=0.0787, simple_loss=0.1099, pruned_loss=0.01777, audio_tagging_loss=0.005995, over 15410.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08927, pruned_loss=0.01221, audio_tagging_loss=0.009046, over 3020350.19 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:05:31,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3613186.6666666665, ans=0.04949747468305833 2023-11-26 23:05:33,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3613253.3333333335, ans=0.1 2023-11-26 23:05:41,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3613253.3333333335, ans=0.125 2023-11-26 23:05:44,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542000 2023-11-26 23:05:47,228 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=12.0 2023-11-26 23:05:51,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3613320.0, ans=0.125 2023-11-26 23:06:11,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3613453.3333333335, ans=0.0 2023-11-26 23:06:18,753 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 950, loss[loss=0.06617, simple_loss=0.08994, pruned_loss=0.01237, audio_tagging_loss=0.008826, over 14220.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08909, pruned_loss=0.01207, audio_tagging_loss=0.00897, over 3021101.11 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:06:40,879 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542050 2023-11-26 23:07:02,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3613786.6666666665, ans=0.125 2023-11-26 23:07:06,132 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:07:10,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2023-11-26 23:07:13,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.708e+01 9.328e+01 1.031e+02 1.282e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 23:07:14,475 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1000, loss[loss=0.06396, simple_loss=0.08442, pruned_loss=0.01298, audio_tagging_loss=0.008762, over 15817.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0896, pruned_loss=0.01216, audio_tagging_loss=0.008765, over 3026338.86 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:07:23,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3613853.3333333335, ans=0.125 2023-11-26 23:07:37,457 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542100 2023-11-26 23:07:38,445 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:07:44,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3613986.6666666665, ans=0.0 2023-11-26 23:07:44,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2023-11-26 23:07:57,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2023-11-26 23:08:08,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-26 23:08:10,450 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1050, loss[loss=0.0713, simple_loss=0.1001, pruned_loss=0.01197, audio_tagging_loss=0.009267, over 14944.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08958, pruned_loss=0.01206, audio_tagging_loss=0.008746, over 3026264.04 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:08:19,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3614186.6666666665, ans=0.5 2023-11-26 23:08:33,555 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542150 2023-11-26 23:08:43,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3614386.6666666665, ans=0.125 2023-11-26 23:08:49,560 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2023-11-26 23:09:05,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.633e+01 9.310e+01 1.034e+02 1.583e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 23:09:05,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3614520.0, ans=0.0 2023-11-26 23:09:06,529 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1100, loss[loss=0.06787, simple_loss=0.08986, pruned_loss=0.01759, audio_tagging_loss=0.005349, over 14508.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08974, pruned_loss=0.01207, audio_tagging_loss=0.008595, over 3037129.18 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:09:09,292 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:09:25,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3614586.6666666665, ans=0.2 2023-11-26 23:09:27,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3614586.6666666665, ans=0.125 2023-11-26 23:09:29,054 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542200 2023-11-26 23:09:29,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3614653.3333333335, ans=0.0 2023-11-26 23:09:30,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3614653.3333333335, ans=0.125 2023-11-26 23:09:40,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3614720.0, ans=0.0 2023-11-26 23:10:02,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-26 23:10:02,960 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1150, loss[loss=0.05196, simple_loss=0.07589, pruned_loss=0.007154, audio_tagging_loss=0.006865, over 16297.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08952, pruned_loss=0.01194, audio_tagging_loss=0.008623, over 3041742.83 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:10:10,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-26 23:10:11,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3614853.3333333335, ans=15.0 2023-11-26 23:10:13,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3614920.0, ans=0.2 2023-11-26 23:10:13,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3614920.0, ans=0.0 2023-11-26 23:10:25,032 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542250 2023-11-26 23:10:45,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3615053.3333333335, ans=0.0 2023-11-26 23:10:46,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3615053.3333333335, ans=0.07 2023-11-26 23:10:52,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3615120.0, ans=0.0 2023-11-26 23:10:57,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.828e+01 9.396e+01 9.982e+01 1.339e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 23:10:59,041 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1200, loss[loss=0.05011, simple_loss=0.06236, pruned_loss=0.008318, audio_tagging_loss=0.01061, over 16236.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08908, pruned_loss=0.01193, audio_tagging_loss=0.008648, over 3044478.63 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:11:01,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3615186.6666666665, ans=0.125 2023-11-26 23:11:04,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3615186.6666666665, ans=0.125 2023-11-26 23:11:21,958 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542300 2023-11-26 23:11:24,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3615320.0, ans=0.125 2023-11-26 23:11:33,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3615386.6666666665, ans=0.125 2023-11-26 23:11:36,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3615386.6666666665, ans=0.125 2023-11-26 23:11:47,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3615453.3333333335, ans=0.125 2023-11-26 23:11:54,894 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1250, loss[loss=0.06566, simple_loss=0.09386, pruned_loss=0.01108, audio_tagging_loss=0.007653, over 15990.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08986, pruned_loss=0.01219, audio_tagging_loss=0.008597, over 3041825.43 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:12:05,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3615586.6666666665, ans=0.2 2023-11-26 23:12:17,450 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542350 2023-11-26 23:12:25,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615653.3333333335, ans=0.1 2023-11-26 23:12:31,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3615720.0, ans=0.2 2023-11-26 23:12:32,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3615720.0, ans=0.125 2023-11-26 23:12:38,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3615786.6666666665, ans=0.2 2023-11-26 23:12:42,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3615786.6666666665, ans=0.125 2023-11-26 23:12:49,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.892e+01 9.390e+01 1.002e+02 1.440e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 23:12:51,070 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1300, loss[loss=0.07328, simple_loss=0.09908, pruned_loss=0.01551, audio_tagging_loss=0.008233, over 14770.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09031, pruned_loss=0.01222, audio_tagging_loss=0.008641, over 3038278.39 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:12:52,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615853.3333333335, ans=0.1 2023-11-26 23:12:53,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3615853.3333333335, ans=0.125 2023-11-26 23:13:00,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3615853.3333333335, ans=0.2 2023-11-26 23:13:10,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615920.0, ans=0.1 2023-11-26 23:13:12,840 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542400 2023-11-26 23:13:15,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3615986.6666666665, ans=0.035 2023-11-26 23:13:21,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.02 vs. limit=10.0 2023-11-26 23:13:22,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3615986.6666666665, ans=15.0 2023-11-26 23:13:29,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3616053.3333333335, ans=0.07 2023-11-26 23:13:30,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-26 23:13:33,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3616053.3333333335, ans=0.125 2023-11-26 23:13:38,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2023-11-26 23:13:47,127 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1350, loss[loss=0.07212, simple_loss=0.1003, pruned_loss=0.01217, audio_tagging_loss=0.009791, over 14836.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.0905, pruned_loss=0.01225, audio_tagging_loss=0.008665, over 3037759.77 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:14:07,979 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2023-11-26 23:14:09,905 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542450 2023-11-26 23:14:16,603 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2023-11-26 23:14:18,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3616320.0, ans=0.2 2023-11-26 23:14:18,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-11-26 23:14:25,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-26 23:14:26,375 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:14:26,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3616386.6666666665, ans=0.2 2023-11-26 23:14:36,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-26 23:14:40,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3616453.3333333335, ans=0.2 2023-11-26 23:14:41,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 9.028e+01 9.621e+01 1.020e+02 1.402e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 23:14:42,820 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1400, loss[loss=0.05443, simple_loss=0.07595, pruned_loss=0.006318, audio_tagging_loss=0.01014, over 16630.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09004, pruned_loss=0.01231, audio_tagging_loss=0.008678, over 3045437.19 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:14:43,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3616520.0, ans=0.1 2023-11-26 23:14:46,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3616520.0, ans=0.07 2023-11-26 23:15:05,803 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542500 2023-11-26 23:15:23,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2023-11-26 23:15:39,435 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1450, loss[loss=0.08095, simple_loss=0.1115, pruned_loss=0.01592, audio_tagging_loss=0.009271, over 16353.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09038, pruned_loss=0.01241, audio_tagging_loss=0.008681, over 3046235.32 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:15:40,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3616853.3333333335, ans=0.1 2023-11-26 23:15:41,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3616853.3333333335, ans=0.125 2023-11-26 23:15:43,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3616853.3333333335, ans=0.0 2023-11-26 23:15:44,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3616853.3333333335, ans=0.0 2023-11-26 23:16:01,461 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542550 2023-11-26 23:16:09,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3616986.6666666665, ans=0.0 2023-11-26 23:16:09,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3616986.6666666665, ans=0.125 2023-11-26 23:16:20,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3617053.3333333335, ans=0.2 2023-11-26 23:16:33,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2023-11-26 23:16:34,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 9.029e+01 9.878e+01 1.085e+02 1.417e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-26 23:16:34,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3617186.6666666665, ans=0.125 2023-11-26 23:16:35,647 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1500, loss[loss=0.04299, simple_loss=0.05155, pruned_loss=0.006843, audio_tagging_loss=0.01037, over 13708.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08952, pruned_loss=0.01233, audio_tagging_loss=0.008769, over 3041422.43 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:16:42,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=15.0 2023-11-26 23:16:51,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3617253.3333333335, ans=0.125 2023-11-26 23:16:58,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542600 2023-11-26 23:17:01,760 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:17:31,309 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1550, loss[loss=0.05926, simple_loss=0.0839, pruned_loss=0.008384, audio_tagging_loss=0.008927, over 15365.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09043, pruned_loss=0.01238, audio_tagging_loss=0.008772, over 3042813.80 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:17:34,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.85 vs. limit=10.0 2023-11-26 23:17:54,317 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542650 2023-11-26 23:18:00,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-26 23:18:02,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3617653.3333333335, ans=0.125 2023-11-26 23:18:22,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3617786.6666666665, ans=0.125 2023-11-26 23:18:26,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.195e+01 9.800e+01 1.042e+02 1.280e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-26 23:18:27,648 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1600, loss[loss=0.09306, simple_loss=0.1249, pruned_loss=0.02323, audio_tagging_loss=0.007394, over 15276.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08965, pruned_loss=0.01224, audio_tagging_loss=0.008907, over 3043512.69 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:18:44,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3617920.0, ans=0.1 2023-11-26 23:18:49,965 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542700 2023-11-26 23:19:15,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3618120.0, ans=0.125 2023-11-26 23:19:18,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3618120.0, ans=0.125 2023-11-26 23:19:23,998 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1650, loss[loss=0.06041, simple_loss=0.08262, pruned_loss=0.01029, audio_tagging_loss=0.008809, over 14719.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09005, pruned_loss=0.01236, audio_tagging_loss=0.008867, over 3046024.38 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:19:25,652 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 23:19:26,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3618186.6666666665, ans=0.125 2023-11-26 23:19:29,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3618186.6666666665, ans=0.125 2023-11-26 23:19:35,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2023-11-26 23:19:40,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3618253.3333333335, ans=0.04949747468305833 2023-11-26 23:19:45,815 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542750 2023-11-26 23:19:47,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3618320.0, ans=0.125 2023-11-26 23:19:50,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3618320.0, ans=0.2 2023-11-26 23:19:52,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3618320.0, ans=0.5 2023-11-26 23:19:53,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3618320.0, ans=0.2 2023-11-26 23:20:03,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3618386.6666666665, ans=0.1 2023-11-26 23:20:05,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3618386.6666666665, ans=0.04949747468305833 2023-11-26 23:20:19,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.999e+01 9.528e+01 1.009e+02 1.834e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 23:20:19,478 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1700, loss[loss=0.06794, simple_loss=0.104, pruned_loss=0.01158, audio_tagging_loss=0.004375, over 15278.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08957, pruned_loss=0.01213, audio_tagging_loss=0.008954, over 3044368.18 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:20:37,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.91 vs. limit=22.5 2023-11-26 23:20:41,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542800 2023-11-26 23:21:15,669 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1750, loss[loss=0.05564, simple_loss=0.07759, pruned_loss=0.005146, audio_tagging_loss=0.01169, over 15027.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08965, pruned_loss=0.01209, audio_tagging_loss=0.008919, over 3042581.68 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:21:17,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3618853.3333333335, ans=0.125 2023-11-26 23:21:23,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3618853.3333333335, ans=0.1 2023-11-26 23:21:33,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3618920.0, ans=0.0 2023-11-26 23:21:38,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542850 2023-11-26 23:21:42,023 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-11-26 23:21:42,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3618986.6666666665, ans=0.0 2023-11-26 23:22:07,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3619120.0, ans=0.125 2023-11-26 23:22:11,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.875e+01 9.667e+01 1.011e+02 1.829e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 23:22:11,456 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1800, loss[loss=0.0605, simple_loss=0.08369, pruned_loss=0.01296, audio_tagging_loss=0.005696, over 14739.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08961, pruned_loss=0.01209, audio_tagging_loss=0.008747, over 3045621.67 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:22:28,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3619253.3333333335, ans=0.125 2023-11-26 23:22:31,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3619253.3333333335, ans=0.2 2023-11-26 23:22:34,082 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542900 2023-11-26 23:22:38,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3619320.0, ans=0.0 2023-11-26 23:22:45,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3619386.6666666665, ans=0.0 2023-11-26 23:22:46,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3619386.6666666665, ans=0.125 2023-11-26 23:23:07,870 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1850, loss[loss=0.07304, simple_loss=0.1006, pruned_loss=0.01629, audio_tagging_loss=0.006461, over 14943.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08978, pruned_loss=0.01223, audio_tagging_loss=0.008676, over 3046169.97 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:23:09,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3619520.0, ans=0.125 2023-11-26 23:23:10,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3619520.0, ans=0.125 2023-11-26 23:23:19,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3619586.6666666665, ans=0.1 2023-11-26 23:23:30,139 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542950 2023-11-26 23:23:39,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3619653.3333333335, ans=0.125 2023-11-26 23:23:42,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3619720.0, ans=0.125 2023-11-26 23:23:44,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3619720.0, ans=0.125 2023-11-26 23:24:04,162 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1900, loss[loss=0.06004, simple_loss=0.0832, pruned_loss=0.009141, audio_tagging_loss=0.009303, over 16412.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09018, pruned_loss=0.01224, audio_tagging_loss=0.008583, over 3052213.64 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:24:05,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 9.189e+01 9.752e+01 1.031e+02 1.213e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 23:24:12,228 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-11-26 23:24:15,204 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:24:19,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=12.0 2023-11-26 23:24:23,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-11-26 23:24:26,778 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543000 2023-11-26 23:24:29,597 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=12.0 2023-11-26 23:24:31,885 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-26 23:24:31,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-26 23:24:38,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3620053.3333333335, ans=0.125 2023-11-26 23:24:47,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-26 23:24:50,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3620120.0, ans=0.95 2023-11-26 23:24:55,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3620120.0, ans=0.125 2023-11-26 23:24:59,802 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1950, loss[loss=0.05636, simple_loss=0.08208, pruned_loss=0.008462, audio_tagging_loss=0.006864, over 15785.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08906, pruned_loss=0.01213, audio_tagging_loss=0.008654, over 3039990.70 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:25:01,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3620186.6666666665, ans=0.125 2023-11-26 23:25:05,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3620186.6666666665, ans=0.125 2023-11-26 23:25:09,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3620186.6666666665, ans=0.1 2023-11-26 23:25:10,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3620253.3333333335, ans=0.125 2023-11-26 23:25:22,779 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543050 2023-11-26 23:25:24,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3620320.0, ans=0.125 2023-11-26 23:25:30,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3620320.0, ans=0.2 2023-11-26 23:25:37,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-26 23:25:38,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3620386.6666666665, ans=0.125 2023-11-26 23:25:44,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3620453.3333333335, ans=0.125 2023-11-26 23:25:56,325 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2000, loss[loss=0.06421, simple_loss=0.08665, pruned_loss=0.01209, audio_tagging_loss=0.008792, over 15405.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08891, pruned_loss=0.01207, audio_tagging_loss=0.008616, over 3039965.89 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:25:57,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.817e+01 9.525e+01 1.016e+02 1.209e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 23:26:07,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3620586.6666666665, ans=0.0 2023-11-26 23:26:15,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3620586.6666666665, ans=0.125 2023-11-26 23:26:18,545 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543100 2023-11-26 23:26:36,600 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2023-11-26 23:26:52,297 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2050, loss[loss=0.07022, simple_loss=0.08853, pruned_loss=0.01361, audio_tagging_loss=0.01234, over 14838.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08929, pruned_loss=0.01214, audio_tagging_loss=0.008583, over 3042944.04 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:26:58,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-26 23:27:05,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3620920.0, ans=0.2 2023-11-26 23:27:14,771 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543150 2023-11-26 23:27:25,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3621053.3333333335, ans=0.0 2023-11-26 23:27:25,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3621053.3333333335, ans=0.0 2023-11-26 23:27:38,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-26 23:27:38,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3621120.0, ans=0.125 2023-11-26 23:27:48,127 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2100, loss[loss=0.07479, simple_loss=0.1056, pruned_loss=0.01495, audio_tagging_loss=0.007047, over 15519.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08949, pruned_loss=0.01224, audio_tagging_loss=0.008519, over 3048144.04 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:27:49,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3621186.6666666665, ans=0.0 2023-11-26 23:27:50,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.873e+01 9.430e+01 1.020e+02 1.802e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 23:28:01,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=12.0 2023-11-26 23:28:11,001 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543200 2023-11-26 23:28:13,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3621320.0, ans=0.125 2023-11-26 23:28:15,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3621320.0, ans=0.1 2023-11-26 23:28:35,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3621453.3333333335, ans=0.125 2023-11-26 23:28:44,525 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2150, loss[loss=0.05462, simple_loss=0.06917, pruned_loss=0.009896, audio_tagging_loss=0.01014, over 15327.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08948, pruned_loss=0.01232, audio_tagging_loss=0.008525, over 3047300.42 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:28:55,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-11-26 23:29:07,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543250 2023-11-26 23:29:17,491 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:29:26,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3621720.0, ans=0.0 2023-11-26 23:29:41,010 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2200, loss[loss=0.08094, simple_loss=0.1141, pruned_loss=0.01664, audio_tagging_loss=0.007242, over 14947.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09066, pruned_loss=0.01243, audio_tagging_loss=0.008528, over 3049322.73 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:29:43,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.935e+01 9.696e+01 1.032e+02 1.602e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-26 23:29:44,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3621853.3333333335, ans=0.0 2023-11-26 23:29:54,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-11-26 23:30:03,471 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543300 2023-11-26 23:30:22,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3622053.3333333335, ans=0.0 2023-11-26 23:30:36,862 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2250, loss[loss=0.0591, simple_loss=0.07394, pruned_loss=0.01312, audio_tagging_loss=0.009012, over 14899.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09023, pruned_loss=0.0125, audio_tagging_loss=0.008642, over 3040978.31 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:30:37,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3622186.6666666665, ans=0.125 2023-11-26 23:30:48,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3622253.3333333335, ans=0.5 2023-11-26 23:30:58,754 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543350 2023-11-26 23:31:32,012 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2300, loss[loss=0.06927, simple_loss=0.09687, pruned_loss=0.009541, audio_tagging_loss=0.0113, over 14110.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08948, pruned_loss=0.01238, audio_tagging_loss=0.008722, over 3042411.07 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:31:34,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.796e+01 9.547e+01 1.006e+02 1.160e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 23:31:41,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-26 23:31:45,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3622586.6666666665, ans=0.2 2023-11-26 23:31:55,044 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543400 2023-11-26 23:32:11,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=12.0 2023-11-26 23:32:16,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3622786.6666666665, ans=0.07 2023-11-26 23:32:20,304 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:32:27,719 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2350, loss[loss=0.05819, simple_loss=0.07844, pruned_loss=0.009444, audio_tagging_loss=0.009532, over 15076.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09066, pruned_loss=0.01257, audio_tagging_loss=0.008779, over 3043271.32 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:32:31,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2023-11-26 23:32:38,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3622853.3333333335, ans=0.125 2023-11-26 23:32:51,414 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543450 2023-11-26 23:32:54,806 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:33:09,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3623053.3333333335, ans=0.125 2023-11-26 23:33:25,278 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2400, loss[loss=0.08471, simple_loss=0.114, pruned_loss=0.01774, audio_tagging_loss=0.009971, over 15170.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09162, pruned_loss=0.01257, audio_tagging_loss=0.008759, over 3042375.75 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:33:27,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.979e+01 9.586e+01 1.037e+02 1.629e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 23:33:36,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3623253.3333333335, ans=0.0 2023-11-26 23:33:47,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543500 2023-11-26 23:33:48,030 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-26 23:34:21,515 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2450, loss[loss=0.04252, simple_loss=0.05267, pruned_loss=0.006264, audio_tagging_loss=0.009916, over 15625.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09111, pruned_loss=0.01253, audio_tagging_loss=0.008801, over 3039784.05 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:34:35,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-11-26 23:34:39,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3623586.6666666665, ans=0.125 2023-11-26 23:34:41,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3623586.6666666665, ans=0.125 2023-11-26 23:34:41,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3623586.6666666665, ans=0.0 2023-11-26 23:34:44,214 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543550 2023-11-26 23:34:47,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3623653.3333333335, ans=0.125 2023-11-26 23:35:06,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3623786.6666666665, ans=0.0 2023-11-26 23:35:16,741 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2500, loss[loss=0.07012, simple_loss=0.09908, pruned_loss=0.01261, audio_tagging_loss=0.007965, over 15325.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09097, pruned_loss=0.01247, audio_tagging_loss=0.008947, over 3042500.56 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:35:18,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.886e+01 9.376e+01 1.002e+02 1.338e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 23:35:20,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3623853.3333333335, ans=0.1 2023-11-26 23:35:34,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3623920.0, ans=0.125 2023-11-26 23:35:40,205 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543600 2023-11-26 23:35:44,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3623986.6666666665, ans=0.125 2023-11-26 23:35:47,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3623986.6666666665, ans=0.125 2023-11-26 23:36:12,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=12.0 2023-11-26 23:36:14,147 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2550, loss[loss=0.05622, simple_loss=0.073, pruned_loss=0.01016, audio_tagging_loss=0.009558, over 14949.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09032, pruned_loss=0.01246, audio_tagging_loss=0.008828, over 3043878.01 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:36:18,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3624186.6666666665, ans=0.1 2023-11-26 23:36:19,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3624186.6666666665, ans=0.1 2023-11-26 23:36:27,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3624253.3333333335, ans=0.2 2023-11-26 23:36:27,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3624253.3333333335, ans=0.125 2023-11-26 23:36:35,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3624320.0, ans=0.125 2023-11-26 23:36:36,157 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543650 2023-11-26 23:36:45,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3624320.0, ans=0.0 2023-11-26 23:36:57,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3624453.3333333335, ans=0.035 2023-11-26 23:37:01,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3624453.3333333335, ans=0.0 2023-11-26 23:37:07,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3624453.3333333335, ans=0.125 2023-11-26 23:37:07,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3624453.3333333335, ans=0.1 2023-11-26 23:37:09,927 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2600, loss[loss=0.0586, simple_loss=0.08436, pruned_loss=0.008742, audio_tagging_loss=0.00768, over 15310.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09034, pruned_loss=0.01232, audio_tagging_loss=0.008705, over 3044849.44 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:37:11,962 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.743e+01 9.424e+01 1.014e+02 1.712e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 23:37:13,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3624520.0, ans=0.125 2023-11-26 23:37:18,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3624520.0, ans=0.125 2023-11-26 23:37:31,802 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543700 2023-11-26 23:37:55,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-26 23:38:05,156 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2650, loss[loss=0.06592, simple_loss=0.09243, pruned_loss=0.01183, audio_tagging_loss=0.007872, over 16186.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09058, pruned_loss=0.01226, audio_tagging_loss=0.008678, over 3046092.50 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:38:28,283 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543750 2023-11-26 23:38:50,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=12.0 2023-11-26 23:38:58,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3625120.0, ans=0.125 2023-11-26 23:39:01,795 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2700, loss[loss=0.05456, simple_loss=0.07176, pruned_loss=0.01067, audio_tagging_loss=0.00802, over 14754.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08992, pruned_loss=0.01216, audio_tagging_loss=0.008575, over 3046916.13 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:39:03,858 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.924e+01 9.565e+01 1.006e+02 1.395e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 23:39:20,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3625253.3333333335, ans=0.0 2023-11-26 23:39:24,312 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543800 2023-11-26 23:39:41,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-11-26 23:39:50,487 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2023-11-26 23:39:52,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3625453.3333333335, ans=0.125 2023-11-26 23:39:58,528 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2750, loss[loss=0.06506, simple_loss=0.08805, pruned_loss=0.01385, audio_tagging_loss=0.007192, over 14896.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08943, pruned_loss=0.01213, audio_tagging_loss=0.008554, over 3046004.12 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:40:20,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543850 2023-11-26 23:40:30,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2023-11-26 23:40:36,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3625720.0, ans=0.5 2023-11-26 23:40:38,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3625720.0, ans=0.125 2023-11-26 23:40:42,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3625786.6666666665, ans=0.125 2023-11-26 23:40:44,669 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:40:45,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3625786.6666666665, ans=0.1 2023-11-26 23:40:47,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2023-11-26 23:40:53,080 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2800, loss[loss=0.06379, simple_loss=0.08308, pruned_loss=0.01334, audio_tagging_loss=0.008909, over 14450.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08895, pruned_loss=0.01219, audio_tagging_loss=0.008556, over 3039258.32 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:40:53,265 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:40:55,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.871e+01 8.947e+01 9.554e+01 1.028e+02 1.223e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 23:40:55,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=22.5 2023-11-26 23:41:12,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3625920.0, ans=0.05 2023-11-26 23:41:13,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3625920.0, ans=0.125 2023-11-26 23:41:15,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543900 2023-11-26 23:41:42,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3626120.0, ans=0.1 2023-11-26 23:41:47,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3626120.0, ans=0.125 2023-11-26 23:41:49,641 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2850, loss[loss=0.08252, simple_loss=0.1121, pruned_loss=0.01933, audio_tagging_loss=0.007141, over 15338.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08951, pruned_loss=0.01226, audio_tagging_loss=0.00858, over 3037305.58 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:41:58,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3626186.6666666665, ans=0.125 2023-11-26 23:42:00,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3626253.3333333335, ans=0.2 2023-11-26 23:42:12,195 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543950 2023-11-26 23:42:14,402 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:42:23,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2023-11-26 23:42:45,064 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2900, loss[loss=0.0769, simple_loss=0.1113, pruned_loss=0.01589, audio_tagging_loss=0.005352, over 15845.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08949, pruned_loss=0.01218, audio_tagging_loss=0.008615, over 3038558.12 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:42:47,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.936e+01 9.597e+01 1.046e+02 1.381e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 23:43:03,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3626586.6666666665, ans=0.0 2023-11-26 23:43:08,353 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544000 2023-11-26 23:43:17,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3626653.3333333335, ans=0.0 2023-11-26 23:43:30,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3626720.0, ans=0.1 2023-11-26 23:43:44,235 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2950, loss[loss=0.05914, simple_loss=0.08667, pruned_loss=0.008521, audio_tagging_loss=0.007288, over 15844.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08978, pruned_loss=0.01222, audio_tagging_loss=0.008664, over 3036984.46 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:43:51,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2023-11-26 23:44:06,806 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544050 2023-11-26 23:44:21,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3627053.3333333335, ans=0.125 2023-11-26 23:44:25,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3627053.3333333335, ans=0.125 2023-11-26 23:44:27,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3627053.3333333335, ans=0.125 2023-11-26 23:44:32,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3627120.0, ans=0.5 2023-11-26 23:44:34,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2023-11-26 23:44:39,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3627186.6666666665, ans=0.0 2023-11-26 23:44:40,280 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3000, loss[loss=0.07119, simple_loss=0.09051, pruned_loss=0.01568, audio_tagging_loss=0.01025, over 15736.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09058, pruned_loss=0.01235, audio_tagging_loss=0.008622, over 3042139.98 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:44:40,281 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-26 23:45:12,598 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.0572, simple_loss=0.05043, pruned_loss=0.00523, audio_tagging_loss=0.02676, over 4681554.00 frames. 2023-11-26 23:45:12,599 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-26 23:45:12,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3627186.6666666665, ans=0.2 2023-11-26 23:45:15,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.002e+01 9.589e+01 1.016e+02 1.351e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 23:45:28,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3627253.3333333335, ans=0.0 2023-11-26 23:45:35,150 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544100 2023-11-26 23:45:40,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3627320.0, ans=0.0 2023-11-26 23:45:51,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3627386.6666666665, ans=0.125 2023-11-26 23:46:08,626 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3050, loss[loss=0.07017, simple_loss=0.09774, pruned_loss=0.01424, audio_tagging_loss=0.007061, over 16655.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09141, pruned_loss=0.01243, audio_tagging_loss=0.008619, over 3044188.17 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:46:30,851 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544150 2023-11-26 23:46:39,905 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:46:53,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3627786.6666666665, ans=0.125 2023-11-26 23:46:54,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-11-26 23:46:57,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3627786.6666666665, ans=0.0 2023-11-26 23:47:03,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=12.0 2023-11-26 23:47:04,317 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3100, loss[loss=0.06642, simple_loss=0.09367, pruned_loss=0.01041, audio_tagging_loss=0.009177, over 15196.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09119, pruned_loss=0.01254, audio_tagging_loss=0.008708, over 3041741.03 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:47:08,070 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.067e+01 9.651e+01 1.052e+02 1.316e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-26 23:47:24,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3627920.0, ans=0.125 2023-11-26 23:47:27,273 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544200 2023-11-26 23:47:27,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3627986.6666666665, ans=0.2 2023-11-26 23:47:27,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627986.6666666665, ans=0.1 2023-11-26 23:47:49,401 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-26 23:47:59,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2023-11-26 23:48:01,331 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3150, loss[loss=0.0614, simple_loss=0.08844, pruned_loss=0.009837, audio_tagging_loss=0.007344, over 15375.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09105, pruned_loss=0.01249, audio_tagging_loss=0.008781, over 3041066.49 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:48:11,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3628253.3333333335, ans=0.07 2023-11-26 23:48:21,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3628253.3333333335, ans=15.0 2023-11-26 23:48:23,395 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544250 2023-11-26 23:48:23,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3628320.0, ans=0.125 2023-11-26 23:48:48,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3628453.3333333335, ans=0.2 2023-11-26 23:48:55,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2023-11-26 23:48:56,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3628520.0, ans=0.2 2023-11-26 23:48:57,457 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3200, loss[loss=0.09345, simple_loss=0.1239, pruned_loss=0.02178, audio_tagging_loss=0.009704, over 14842.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09008, pruned_loss=0.01225, audio_tagging_loss=0.008847, over 3042685.32 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:48:58,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2023-11-26 23:49:00,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.824e+01 9.434e+01 1.022e+02 1.249e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 23:49:04,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3628520.0, ans=0.0 2023-11-26 23:49:09,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3628586.6666666665, ans=0.0 2023-11-26 23:49:15,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3628586.6666666665, ans=12.0 2023-11-26 23:49:19,856 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544300 2023-11-26 23:49:29,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3628653.3333333335, ans=0.2 2023-11-26 23:49:35,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3628720.0, ans=0.05 2023-11-26 23:49:53,381 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3250, loss[loss=0.04266, simple_loss=0.05144, pruned_loss=0.007517, audio_tagging_loss=0.009427, over 15858.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08911, pruned_loss=0.01212, audio_tagging_loss=0.008892, over 3050240.81 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:49:55,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3628853.3333333335, ans=0.04949747468305833 2023-11-26 23:50:12,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-26 23:50:14,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3628986.6666666665, ans=0.035 2023-11-26 23:50:15,642 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544350 2023-11-26 23:50:48,934 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3300, loss[loss=0.08023, simple_loss=0.1153, pruned_loss=0.01586, audio_tagging_loss=0.006738, over 14596.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08926, pruned_loss=0.012, audio_tagging_loss=0.009022, over 3047424.89 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:50:52,764 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.136e+01 9.828e+01 1.104e+02 1.362e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-26 23:50:56,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3629186.6666666665, ans=0.125 2023-11-26 23:51:00,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3629253.3333333335, ans=0.125 2023-11-26 23:51:07,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-26 23:51:11,479 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544400 2023-11-26 23:51:12,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3629320.0, ans=0.125 2023-11-26 23:51:33,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3629453.3333333335, ans=0.0 2023-11-26 23:51:36,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3629453.3333333335, ans=0.0 2023-11-26 23:51:45,137 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3350, loss[loss=0.06939, simple_loss=0.1035, pruned_loss=0.01189, audio_tagging_loss=0.005738, over 14886.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08935, pruned_loss=0.01208, audio_tagging_loss=0.008953, over 3056374.97 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:51:53,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3629520.0, ans=0.125 2023-11-26 23:51:55,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2023-11-26 23:52:07,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544450 2023-11-26 23:52:12,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2023-11-26 23:52:19,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3629720.0, ans=0.0 2023-11-26 23:52:27,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3629720.0, ans=0.125 2023-11-26 23:52:40,871 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3400, loss[loss=0.0636, simple_loss=0.0808, pruned_loss=0.01575, audio_tagging_loss=0.007449, over 14971.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08899, pruned_loss=0.01212, audio_tagging_loss=0.008821, over 3058846.55 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:52:45,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.870e+01 9.488e+01 1.024e+02 1.498e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 23:52:48,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3629853.3333333335, ans=0.125 2023-11-26 23:52:49,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3629853.3333333335, ans=0.125 2023-11-26 23:52:51,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3629920.0, ans=0.0 2023-11-26 23:52:53,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3629920.0, ans=0.125 2023-11-26 23:52:54,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3629920.0, ans=0.125 2023-11-26 23:52:59,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3629920.0, ans=0.1 2023-11-26 23:52:59,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3629920.0, ans=0.1 2023-11-26 23:53:03,571 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544500 2023-11-26 23:53:05,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-26 23:53:11,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3629986.6666666665, ans=0.025 2023-11-26 23:53:37,205 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3450, loss[loss=0.06428, simple_loss=0.08275, pruned_loss=0.009227, audio_tagging_loss=0.01367, over 14868.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08937, pruned_loss=0.01227, audio_tagging_loss=0.008707, over 3054506.74 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:53:44,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=3630186.6666666665, ans=22.5 2023-11-26 23:53:52,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3630253.3333333335, ans=0.125 2023-11-26 23:53:58,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544550 2023-11-26 23:54:08,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3630320.0, ans=0.2 2023-11-26 23:54:24,263 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:54:28,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3630453.3333333335, ans=0.04949747468305833 2023-11-26 23:54:32,551 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3500, loss[loss=0.04888, simple_loss=0.05536, pruned_loss=0.01152, audio_tagging_loss=0.009681, over 14328.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08951, pruned_loss=0.01235, audio_tagging_loss=0.008704, over 3049127.80 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:54:36,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 9.117e+01 9.795e+01 1.053e+02 1.409e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-26 23:54:47,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3630586.6666666665, ans=0.0 2023-11-26 23:54:47,363 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-11-26 23:54:53,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3630586.6666666665, ans=0.125 2023-11-26 23:54:54,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3630653.3333333335, ans=0.05 2023-11-26 23:54:55,526 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544600 2023-11-26 23:55:01,003 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:55:27,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3630853.3333333335, ans=0.125 2023-11-26 23:55:28,214 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3550, loss[loss=0.04088, simple_loss=0.05392, pruned_loss=0.005835, audio_tagging_loss=0.008081, over 14797.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0892, pruned_loss=0.01221, audio_tagging_loss=0.008598, over 3044567.12 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:55:51,555 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544650 2023-11-26 23:55:52,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-11-26 23:55:56,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3630986.6666666665, ans=0.2 2023-11-26 23:56:13,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3631120.0, ans=0.0 2023-11-26 23:56:15,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3631120.0, ans=0.0 2023-11-26 23:56:20,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631120.0, ans=0.1 2023-11-26 23:56:25,419 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3600, loss[loss=0.05586, simple_loss=0.0767, pruned_loss=0.01181, audio_tagging_loss=0.005696, over 13789.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08982, pruned_loss=0.01243, audio_tagging_loss=0.008528, over 3042287.51 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:56:29,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.770e+01 9.299e+01 1.012e+02 1.507e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 23:56:34,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3631186.6666666665, ans=0.0 2023-11-26 23:56:39,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3631253.3333333335, ans=0.125 2023-11-26 23:56:40,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3631253.3333333335, ans=0.125 2023-11-26 23:56:43,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3631253.3333333335, ans=0.125 2023-11-26 23:56:47,216 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544700 2023-11-26 23:56:52,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-26 23:57:08,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3631453.3333333335, ans=0.025 2023-11-26 23:57:16,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3631453.3333333335, ans=0.125 2023-11-26 23:57:20,896 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3650, loss[loss=0.05554, simple_loss=0.07987, pruned_loss=0.006933, audio_tagging_loss=0.008667, over 14938.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08915, pruned_loss=0.01225, audio_tagging_loss=0.008516, over 3044827.76 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:57:25,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631520.0, ans=0.1 2023-11-26 23:57:29,653 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:57:43,360 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544750 2023-11-26 23:57:46,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3631653.3333333335, ans=0.0 2023-11-26 23:57:47,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2023-11-26 23:57:48,296 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:57:51,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-11-26 23:57:55,205 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-26 23:57:57,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3631720.0, ans=0.125 2023-11-26 23:58:13,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3631786.6666666665, ans=0.0 2023-11-26 23:58:16,363 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3700, loss[loss=0.07293, simple_loss=0.1048, pruned_loss=0.01228, audio_tagging_loss=0.008268, over 15515.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0901, pruned_loss=0.01214, audio_tagging_loss=0.008556, over 3049917.36 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:58:20,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.914e+01 9.498e+01 1.020e+02 1.600e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 23:58:28,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-11-26 23:58:30,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3631920.0, ans=0.0 2023-11-26 23:58:30,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3631920.0, ans=0.125 2023-11-26 23:58:40,017 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544800 2023-11-26 23:58:45,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3631986.6666666665, ans=0.0 2023-11-26 23:58:49,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3631986.6666666665, ans=0.125 2023-11-26 23:59:13,824 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3750, loss[loss=0.0977, simple_loss=0.1361, pruned_loss=0.02304, audio_tagging_loss=0.006618, over 15450.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09042, pruned_loss=0.01227, audio_tagging_loss=0.008576, over 3048568.20 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:59:18,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3632186.6666666665, ans=0.0 2023-11-26 23:59:30,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3632253.3333333335, ans=0.125 2023-11-26 23:59:35,699 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544850 2023-11-26 23:59:36,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3632320.0, ans=0.125 2023-11-26 23:59:49,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632386.6666666665, ans=0.1 2023-11-26 23:59:51,096 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:00:09,625 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3800, loss[loss=0.06414, simple_loss=0.07964, pruned_loss=0.01138, audio_tagging_loss=0.01294, over 15067.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09063, pruned_loss=0.01217, audio_tagging_loss=0.008643, over 3050760.69 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:00:14,899 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 9.124e+01 9.737e+01 1.067e+02 1.479e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:00:22,503 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:00:23,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3632586.6666666665, ans=0.0 2023-11-27 00:00:31,617 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544900 2023-11-27 00:00:33,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3632653.3333333335, ans=0.125 2023-11-27 00:00:35,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2023-11-27 00:00:52,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3632720.0, ans=0.125 2023-11-27 00:01:02,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3632786.6666666665, ans=0.125 2023-11-27 00:01:04,887 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3850, loss[loss=0.06148, simple_loss=0.08364, pruned_loss=0.008934, audio_tagging_loss=0.01073, over 15136.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09119, pruned_loss=0.01217, audio_tagging_loss=0.008616, over 3057026.17 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:01:05,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3632853.3333333335, ans=0.125 2023-11-27 00:01:10,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3632853.3333333335, ans=0.125 2023-11-27 00:01:10,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3632853.3333333335, ans=0.1 2023-11-27 00:01:12,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3632853.3333333335, ans=0.125 2023-11-27 00:01:14,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3632853.3333333335, ans=0.1 2023-11-27 00:01:23,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2023-11-27 00:01:28,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544950 2023-11-27 00:01:35,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3632986.6666666665, ans=0.0 2023-11-27 00:01:53,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3633120.0, ans=10.0 2023-11-27 00:01:54,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3633120.0, ans=0.1 2023-11-27 00:01:58,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=22.5 2023-11-27 00:02:01,513 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3900, loss[loss=0.07333, simple_loss=0.105, pruned_loss=0.01351, audio_tagging_loss=0.007291, over 15929.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09065, pruned_loss=0.01214, audio_tagging_loss=0.008626, over 3052023.13 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:02:02,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3633186.6666666665, ans=0.2 2023-11-27 00:02:07,298 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.766e+01 9.510e+01 1.042e+02 1.590e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 00:02:23,960 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545000 2023-11-27 00:02:58,142 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3950, loss[loss=0.0565, simple_loss=0.07703, pruned_loss=0.009002, audio_tagging_loss=0.008986, over 14456.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09082, pruned_loss=0.01217, audio_tagging_loss=0.008588, over 3050702.51 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:02:59,420 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:03:10,249 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:03:19,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545050 2023-11-27 00:03:28,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3633653.3333333335, ans=0.2 2023-11-27 00:03:48,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3633786.6666666665, ans=0.2 2023-11-27 00:03:51,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3633786.6666666665, ans=0.0 2023-11-27 00:03:53,764 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4000, loss[loss=0.066, simple_loss=0.08639, pruned_loss=0.01456, audio_tagging_loss=0.00825, over 15786.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.0918, pruned_loss=0.01244, audio_tagging_loss=0.008576, over 3056706.55 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:03:57,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3633853.3333333335, ans=0.09899494936611666 2023-11-27 00:03:59,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 9.088e+01 9.544e+01 1.045e+02 1.311e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 00:04:06,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3633920.0, ans=0.125 2023-11-27 00:04:09,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3633920.0, ans=0.0 2023-11-27 00:04:13,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3633920.0, ans=0.125 2023-11-27 00:04:16,125 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545100 2023-11-27 00:04:33,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2023-11-27 00:04:37,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3634120.0, ans=0.125 2023-11-27 00:04:49,486 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4050, loss[loss=0.06709, simple_loss=0.09492, pruned_loss=0.009704, audio_tagging_loss=0.009922, over 14867.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09135, pruned_loss=0.01235, audio_tagging_loss=0.00865, over 3054945.75 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:04:52,291 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:04:59,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2023-11-27 00:05:12,248 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545150 2023-11-27 00:05:27,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3634386.6666666665, ans=0.125 2023-11-27 00:05:28,955 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:05:32,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3634386.6666666665, ans=0.5 2023-11-27 00:05:36,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3634453.3333333335, ans=0.125 2023-11-27 00:05:40,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3634453.3333333335, ans=0.04949747468305833 2023-11-27 00:05:46,168 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4100, loss[loss=0.06629, simple_loss=0.08863, pruned_loss=0.0117, audio_tagging_loss=0.01027, over 15535.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09078, pruned_loss=0.01217, audio_tagging_loss=0.008662, over 3055488.02 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:05:46,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3634520.0, ans=0.0 2023-11-27 00:05:52,466 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.888e+01 9.665e+01 1.037e+02 1.522e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 00:06:07,390 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545200 2023-11-27 00:06:17,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3634653.3333333335, ans=0.125 2023-11-27 00:06:33,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3634786.6666666665, ans=0.02 2023-11-27 00:06:38,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3634786.6666666665, ans=0.1 2023-11-27 00:06:41,829 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4150, loss[loss=0.06855, simple_loss=0.09246, pruned_loss=0.01339, audio_tagging_loss=0.008934, over 14178.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0915, pruned_loss=0.0123, audio_tagging_loss=0.008618, over 3054762.97 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:07:02,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3634920.0, ans=0.1 2023-11-27 00:07:04,248 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545250 2023-11-27 00:07:07,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3634986.6666666665, ans=0.125 2023-11-27 00:07:13,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2023-11-27 00:07:17,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3635053.3333333335, ans=0.125 2023-11-27 00:07:22,309 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:07:35,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3635120.0, ans=0.0 2023-11-27 00:07:37,620 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4200, loss[loss=0.07233, simple_loss=0.1025, pruned_loss=0.01244, audio_tagging_loss=0.00865, over 16358.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09033, pruned_loss=0.01204, audio_tagging_loss=0.008464, over 3054344.89 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:07:40,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3635186.6666666665, ans=0.1 2023-11-27 00:07:44,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.031e+01 9.580e+01 1.007e+02 1.196e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 00:07:50,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3635253.3333333335, ans=0.2 2023-11-27 00:07:52,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3635253.3333333335, ans=0.125 2023-11-27 00:08:00,809 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545300 2023-11-27 00:08:08,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3635320.0, ans=0.0 2023-11-27 00:08:25,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3635453.3333333335, ans=0.2 2023-11-27 00:08:25,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3635453.3333333335, ans=0.125 2023-11-27 00:08:33,919 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4250, loss[loss=0.05469, simple_loss=0.07171, pruned_loss=0.01054, audio_tagging_loss=0.008299, over 14383.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.0908, pruned_loss=0.01208, audio_tagging_loss=0.008342, over 3052599.21 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:08:56,283 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545350 2023-11-27 00:09:14,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3635720.0, ans=0.125 2023-11-27 00:09:30,153 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4300, loss[loss=0.06306, simple_loss=0.09228, pruned_loss=0.009089, audio_tagging_loss=0.007826, over 16068.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.0894, pruned_loss=0.01179, audio_tagging_loss=0.008395, over 3054207.41 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:09:31,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=12.0 2023-11-27 00:09:33,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2023-11-27 00:09:36,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 9.001e+01 9.508e+01 1.030e+02 1.268e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 00:09:40,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3635920.0, ans=0.1 2023-11-27 00:09:52,704 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545400 2023-11-27 00:10:04,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3636053.3333333335, ans=0.125 2023-11-27 00:10:15,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3636120.0, ans=0.125 2023-11-27 00:10:25,679 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4350, loss[loss=0.0726, simple_loss=0.09829, pruned_loss=0.0133, audio_tagging_loss=0.01015, over 14225.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08886, pruned_loss=0.01162, audio_tagging_loss=0.008386, over 3055637.76 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:10:44,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3636253.3333333335, ans=0.125 2023-11-27 00:10:49,143 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545450 2023-11-27 00:10:55,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2023-11-27 00:11:11,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3636453.3333333335, ans=0.0 2023-11-27 00:11:19,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3636453.3333333335, ans=0.125 2023-11-27 00:11:20,786 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-27 00:11:22,381 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4400, loss[loss=0.05627, simple_loss=0.0683, pruned_loss=0.01216, audio_tagging_loss=0.009957, over 15165.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08931, pruned_loss=0.01175, audio_tagging_loss=0.008396, over 3062942.57 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:11:30,495 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 9.047e+01 9.734e+01 1.041e+02 1.241e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:11:45,019 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545500 2023-11-27 00:11:49,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3636653.3333333335, ans=0.1 2023-11-27 00:12:09,046 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.25 vs. limit=15.0 2023-11-27 00:12:18,830 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4450, loss[loss=0.07119, simple_loss=0.09883, pruned_loss=0.01457, audio_tagging_loss=0.007207, over 14953.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08926, pruned_loss=0.01184, audio_tagging_loss=0.0084, over 3059298.35 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:12:41,849 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545550 2023-11-27 00:12:44,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3636986.6666666665, ans=0.125 2023-11-27 00:12:45,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3636986.6666666665, ans=0.125 2023-11-27 00:12:45,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3636986.6666666665, ans=0.0 2023-11-27 00:12:47,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3636986.6666666665, ans=0.0 2023-11-27 00:12:54,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2023-11-27 00:12:59,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3637053.3333333335, ans=0.125 2023-11-27 00:13:14,871 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4500, loss[loss=0.05077, simple_loss=0.06065, pruned_loss=0.006934, audio_tagging_loss=0.01351, over 13697.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.09012, pruned_loss=0.01199, audio_tagging_loss=0.008391, over 3059426.32 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:13:23,384 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.728e+01 9.573e+01 1.027e+02 1.215e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 00:13:25,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3637253.3333333335, ans=0.07 2023-11-27 00:13:26,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3637253.3333333335, ans=0.0 2023-11-27 00:13:29,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3637253.3333333335, ans=0.0 2023-11-27 00:13:34,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3637253.3333333335, ans=0.0 2023-11-27 00:13:37,796 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545600 2023-11-27 00:13:42,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3637320.0, ans=0.0 2023-11-27 00:13:49,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3637386.6666666665, ans=0.125 2023-11-27 00:13:52,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3637386.6666666665, ans=0.2 2023-11-27 00:14:02,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3637453.3333333335, ans=0.2 2023-11-27 00:14:11,575 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4550, loss[loss=0.07727, simple_loss=0.1006, pruned_loss=0.01728, audio_tagging_loss=0.009665, over 14671.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.09, pruned_loss=0.01202, audio_tagging_loss=0.008392, over 3057389.19 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:14:15,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3637520.0, ans=0.0 2023-11-27 00:14:33,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545650 2023-11-27 00:14:37,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3637653.3333333335, ans=0.0 2023-11-27 00:14:50,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3637720.0, ans=0.125 2023-11-27 00:14:54,426 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:15:06,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3637853.3333333335, ans=0.125 2023-11-27 00:15:07,704 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4600, loss[loss=0.07239, simple_loss=0.09603, pruned_loss=0.01332, audio_tagging_loss=0.01106, over 14743.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09054, pruned_loss=0.01224, audio_tagging_loss=0.008442, over 3053731.76 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:15:15,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.975e+01 9.578e+01 1.039e+02 1.809e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 00:15:18,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3637920.0, ans=0.04949747468305833 2023-11-27 00:15:19,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3637920.0, ans=0.2 2023-11-27 00:15:24,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-27 00:15:29,860 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545700 2023-11-27 00:15:45,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3638053.3333333335, ans=0.125 2023-11-27 00:15:51,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3638120.0, ans=0.05 2023-11-27 00:16:02,966 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4650, loss[loss=0.05919, simple_loss=0.0771, pruned_loss=0.008629, audio_tagging_loss=0.01201, over 14760.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08948, pruned_loss=0.01203, audio_tagging_loss=0.008569, over 3054015.73 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:16:04,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3638186.6666666665, ans=0.1 2023-11-27 00:16:18,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3638253.3333333335, ans=0.125 2023-11-27 00:16:26,553 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545750 2023-11-27 00:16:26,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3638320.0, ans=0.0 2023-11-27 00:16:36,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3638386.6666666665, ans=0.07 2023-11-27 00:16:55,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3638453.3333333335, ans=0.1 2023-11-27 00:16:59,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3638520.0, ans=0.07 2023-11-27 00:16:59,969 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4700, loss[loss=0.06395, simple_loss=0.0814, pruned_loss=0.01464, audio_tagging_loss=0.008613, over 14931.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08969, pruned_loss=0.01214, audio_tagging_loss=0.008605, over 3054008.80 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:17:00,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3638520.0, ans=0.0 2023-11-27 00:17:07,411 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 9.156e+01 9.734e+01 1.046e+02 1.264e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:17:07,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3638520.0, ans=0.125 2023-11-27 00:17:10,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3638586.6666666665, ans=0.125 2023-11-27 00:17:21,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545800 2023-11-27 00:17:41,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3638720.0, ans=0.125 2023-11-27 00:17:48,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3638786.6666666665, ans=0.5 2023-11-27 00:17:56,610 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4750, loss[loss=0.05988, simple_loss=0.08036, pruned_loss=0.009789, audio_tagging_loss=0.009908, over 14920.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08882, pruned_loss=0.01202, audio_tagging_loss=0.008761, over 3056375.57 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:18:10,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3638920.0, ans=0.125 2023-11-27 00:18:18,625 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545850 2023-11-27 00:18:21,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2023-11-27 00:18:51,456 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4800, loss[loss=0.0598, simple_loss=0.08557, pruned_loss=0.007029, audio_tagging_loss=0.009982, over 15396.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08947, pruned_loss=0.01206, audio_tagging_loss=0.008833, over 3057563.91 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:18:51,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3639186.6666666665, ans=0.2 2023-11-27 00:18:59,441 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 8.803e+01 9.667e+01 1.040e+02 1.360e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 00:19:14,546 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545900 2023-11-27 00:19:15,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=12.0 2023-11-27 00:19:21,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3639320.0, ans=0.09899494936611666 2023-11-27 00:19:42,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3639453.3333333335, ans=0.1 2023-11-27 00:19:47,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3639520.0, ans=0.125 2023-11-27 00:19:48,912 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4850, loss[loss=0.07201, simple_loss=0.1016, pruned_loss=0.01317, audio_tagging_loss=0.008056, over 15974.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08969, pruned_loss=0.01213, audio_tagging_loss=0.008947, over 3053335.51 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:20:01,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3639586.6666666665, ans=0.125 2023-11-27 00:20:10,920 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545950 2023-11-27 00:20:16,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=12.0 2023-11-27 00:20:25,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3639720.0, ans=0.0 2023-11-27 00:20:39,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3639786.6666666665, ans=0.125 2023-11-27 00:20:44,964 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4900, loss[loss=0.06639, simple_loss=0.09763, pruned_loss=0.01083, audio_tagging_loss=0.006751, over 16236.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08874, pruned_loss=0.01198, audio_tagging_loss=0.008922, over 3053649.70 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:20:48,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3639853.3333333335, ans=0.025 2023-11-27 00:20:52,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.929e+01 9.407e+01 1.023e+02 1.723e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 00:20:53,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3639853.3333333335, ans=0.0 2023-11-27 00:20:58,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3639920.0, ans=0.1 2023-11-27 00:21:01,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3639920.0, ans=0.05 2023-11-27 00:21:06,442 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546000 2023-11-27 00:21:18,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3640053.3333333335, ans=0.0 2023-11-27 00:21:37,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3640120.0, ans=0.2 2023-11-27 00:21:38,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3640120.0, ans=0.1 2023-11-27 00:21:40,252 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4950, loss[loss=0.07226, simple_loss=0.1057, pruned_loss=0.0116, audio_tagging_loss=0.007814, over 15438.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08965, pruned_loss=0.01218, audio_tagging_loss=0.008719, over 3060129.03 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:21:56,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3640253.3333333335, ans=0.0 2023-11-27 00:22:02,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3640320.0, ans=0.125 2023-11-27 00:22:02,998 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546050 2023-11-27 00:22:05,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3640320.0, ans=0.0 2023-11-27 00:22:08,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3640320.0, ans=0.0 2023-11-27 00:22:13,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3640386.6666666665, ans=0.0 2023-11-27 00:22:15,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3640386.6666666665, ans=0.1 2023-11-27 00:22:16,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3640386.6666666665, ans=0.125 2023-11-27 00:22:18,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3640386.6666666665, ans=0.125 2023-11-27 00:22:29,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2023-11-27 00:22:35,935 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5000, loss[loss=0.05717, simple_loss=0.08194, pruned_loss=0.007192, audio_tagging_loss=0.009006, over 15504.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08868, pruned_loss=0.01209, audio_tagging_loss=0.008665, over 3050497.61 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:22:44,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.925e+01 9.606e+01 1.023e+02 1.240e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 00:22:46,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3640586.6666666665, ans=0.125 2023-11-27 00:22:59,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546100 2023-11-27 00:23:06,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3640653.3333333335, ans=0.2 2023-11-27 00:23:32,434 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5050, loss[loss=0.04958, simple_loss=0.06334, pruned_loss=0.008358, audio_tagging_loss=0.009553, over 15026.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08923, pruned_loss=0.01212, audio_tagging_loss=0.008586, over 3046274.90 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:23:54,306 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546150 2023-11-27 00:24:02,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3640986.6666666665, ans=0.09899494936611666 2023-11-27 00:24:28,499 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5100, loss[loss=0.08358, simple_loss=0.1115, pruned_loss=0.01778, audio_tagging_loss=0.01005, over 14907.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0897, pruned_loss=0.01227, audio_tagging_loss=0.008609, over 3044657.50 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:24:35,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.921e+01 9.596e+01 1.036e+02 1.225e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-27 00:24:37,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3641186.6666666665, ans=0.125 2023-11-27 00:24:51,074 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546200 2023-11-27 00:25:02,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3641386.6666666665, ans=0.125 2023-11-27 00:25:04,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3641386.6666666665, ans=0.125 2023-11-27 00:25:15,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3641453.3333333335, ans=0.07 2023-11-27 00:25:15,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3641453.3333333335, ans=0.125 2023-11-27 00:25:24,921 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5150, loss[loss=0.05558, simple_loss=0.07472, pruned_loss=0.01067, audio_tagging_loss=0.007557, over 16439.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08969, pruned_loss=0.01221, audio_tagging_loss=0.008599, over 3049717.00 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:25:32,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3641520.0, ans=0.125 2023-11-27 00:25:33,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3641520.0, ans=0.0 2023-11-27 00:25:34,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3641520.0, ans=0.0 2023-11-27 00:25:47,982 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546250 2023-11-27 00:25:50,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3641653.3333333335, ans=0.125 2023-11-27 00:25:52,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-27 00:26:10,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3641786.6666666665, ans=0.0 2023-11-27 00:26:20,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3641853.3333333335, ans=0.125 2023-11-27 00:26:20,888 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5200, loss[loss=0.08632, simple_loss=0.1279, pruned_loss=0.01569, audio_tagging_loss=0.006699, over 15751.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0908, pruned_loss=0.01227, audio_tagging_loss=0.008437, over 3051191.99 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:26:24,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3641853.3333333335, ans=0.0 2023-11-27 00:26:25,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3641853.3333333335, ans=0.125 2023-11-27 00:26:29,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 9.022e+01 9.726e+01 1.018e+02 1.270e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 00:26:39,117 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:26:43,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546300 2023-11-27 00:27:07,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3642120.0, ans=0.1 2023-11-27 00:27:08,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-27 00:27:16,513 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5250, loss[loss=0.05838, simple_loss=0.08583, pruned_loss=0.008533, audio_tagging_loss=0.00693, over 14777.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09041, pruned_loss=0.0122, audio_tagging_loss=0.008469, over 3050084.57 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:27:25,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3642186.6666666665, ans=15.0 2023-11-27 00:27:31,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642253.3333333335, ans=0.1 2023-11-27 00:27:38,974 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546350 2023-11-27 00:27:40,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3642320.0, ans=0.015 2023-11-27 00:27:42,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3642320.0, ans=0.125 2023-11-27 00:27:43,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3642320.0, ans=0.07 2023-11-27 00:28:01,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3642453.3333333335, ans=0.025 2023-11-27 00:28:02,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3642453.3333333335, ans=0.125 2023-11-27 00:28:07,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3642453.3333333335, ans=0.0 2023-11-27 00:28:08,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3642453.3333333335, ans=0.1 2023-11-27 00:28:11,781 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5300, loss[loss=0.07254, simple_loss=0.1005, pruned_loss=0.01459, audio_tagging_loss=0.007699, over 16250.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.0916, pruned_loss=0.01237, audio_tagging_loss=0.008406, over 3052386.42 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:28:22,506 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.037e+01 9.686e+01 1.067e+02 1.240e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 00:28:23,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-11-27 00:28:35,407 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546400 2023-11-27 00:28:35,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3642653.3333333335, ans=0.02 2023-11-27 00:28:39,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3642653.3333333335, ans=0.025 2023-11-27 00:29:02,460 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:29:05,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3642786.6666666665, ans=0.125 2023-11-27 00:29:08,585 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5350, loss[loss=0.05795, simple_loss=0.0729, pruned_loss=0.01173, audio_tagging_loss=0.009759, over 16263.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09053, pruned_loss=0.01221, audio_tagging_loss=0.008443, over 3050957.84 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:29:14,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3642853.3333333335, ans=0.125 2023-11-27 00:29:20,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3642920.0, ans=0.2 2023-11-27 00:29:23,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642920.0, ans=0.1 2023-11-27 00:29:28,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3642920.0, ans=0.125 2023-11-27 00:29:31,196 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546450 2023-11-27 00:29:47,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3643053.3333333335, ans=0.125 2023-11-27 00:29:51,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3643053.3333333335, ans=10.0 2023-11-27 00:30:05,135 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5400, loss[loss=0.07113, simple_loss=0.09641, pruned_loss=0.01288, audio_tagging_loss=0.01004, over 16747.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09198, pruned_loss=0.01243, audio_tagging_loss=0.008405, over 3051708.58 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:30:09,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3643186.6666666665, ans=0.125 2023-11-27 00:30:14,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.994e+01 9.613e+01 1.047e+02 1.327e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 00:30:23,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3643253.3333333335, ans=0.2 2023-11-27 00:30:24,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=22.5 2023-11-27 00:30:27,029 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546500 2023-11-27 00:30:27,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-11-27 00:30:44,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3643386.6666666665, ans=0.125 2023-11-27 00:30:44,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3643386.6666666665, ans=0.07 2023-11-27 00:30:56,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3643453.3333333335, ans=0.125 2023-11-27 00:30:58,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3643453.3333333335, ans=0.125 2023-11-27 00:31:00,368 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5450, loss[loss=0.06681, simple_loss=0.0844, pruned_loss=0.01417, audio_tagging_loss=0.01043, over 15106.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09199, pruned_loss=0.01243, audio_tagging_loss=0.008413, over 3053346.97 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:31:00,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3643520.0, ans=0.125 2023-11-27 00:31:22,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3643653.3333333335, ans=0.125 2023-11-27 00:31:23,089 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546550 2023-11-27 00:31:37,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3643720.0, ans=0.125 2023-11-27 00:31:47,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3643786.6666666665, ans=0.1 2023-11-27 00:31:50,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3643786.6666666665, ans=0.125 2023-11-27 00:31:56,712 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5500, loss[loss=0.06474, simple_loss=0.0846, pruned_loss=0.01167, audio_tagging_loss=0.01077, over 15554.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09193, pruned_loss=0.01244, audio_tagging_loss=0.008374, over 3046965.14 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:31:58,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3643853.3333333335, ans=0.125 2023-11-27 00:32:01,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3643853.3333333335, ans=0.125 2023-11-27 00:32:06,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3643853.3333333335, ans=0.2 2023-11-27 00:32:06,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3643853.3333333335, ans=0.2 2023-11-27 00:32:06,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.879e+01 9.698e+01 1.044e+02 1.314e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-27 00:32:08,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3643920.0, ans=0.125 2023-11-27 00:32:11,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3643920.0, ans=0.0 2023-11-27 00:32:15,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3643920.0, ans=0.0 2023-11-27 00:32:19,450 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546600 2023-11-27 00:32:34,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-11-27 00:32:50,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2023-11-27 00:32:53,006 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5550, loss[loss=0.06629, simple_loss=0.0953, pruned_loss=0.01083, audio_tagging_loss=0.007804, over 14694.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09149, pruned_loss=0.01231, audio_tagging_loss=0.00839, over 3044896.84 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:33:01,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3644186.6666666665, ans=0.125 2023-11-27 00:33:07,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3644253.3333333335, ans=0.0 2023-11-27 00:33:15,257 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546650 2023-11-27 00:33:17,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3644320.0, ans=0.0 2023-11-27 00:33:32,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3644386.6666666665, ans=0.2 2023-11-27 00:33:38,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3644453.3333333335, ans=0.0 2023-11-27 00:33:47,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3644520.0, ans=0.1 2023-11-27 00:33:48,614 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5600, loss[loss=0.04839, simple_loss=0.06313, pruned_loss=0.006169, audio_tagging_loss=0.01065, over 15061.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09085, pruned_loss=0.0122, audio_tagging_loss=0.008563, over 3045411.23 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:33:58,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.835e+01 9.433e+01 1.028e+02 1.297e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 00:34:11,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546700 2023-11-27 00:34:22,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3644720.0, ans=0.125 2023-11-27 00:34:28,987 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:34:37,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3644786.6666666665, ans=22.5 2023-11-27 00:34:42,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3644786.6666666665, ans=0.125 2023-11-27 00:34:42,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3644786.6666666665, ans=0.125 2023-11-27 00:34:44,710 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5650, loss[loss=0.07022, simple_loss=0.1017, pruned_loss=0.01428, audio_tagging_loss=0.005099, over 14512.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09059, pruned_loss=0.01232, audio_tagging_loss=0.008754, over 3046128.78 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:34:49,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3644853.3333333335, ans=0.125 2023-11-27 00:35:03,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3644920.0, ans=0.07 2023-11-27 00:35:06,495 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546750 2023-11-27 00:35:07,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3644986.6666666665, ans=0.2 2023-11-27 00:35:34,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3645120.0, ans=0.0 2023-11-27 00:35:40,807 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5700, loss[loss=0.06908, simple_loss=0.09285, pruned_loss=0.01652, audio_tagging_loss=0.006137, over 14307.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0908, pruned_loss=0.01244, audio_tagging_loss=0.008716, over 3048291.10 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:35:45,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3645186.6666666665, ans=0.125 2023-11-27 00:35:50,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 8.853e+01 9.368e+01 1.022e+02 1.504e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 00:36:03,315 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546800 2023-11-27 00:36:09,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3645320.0, ans=0.125 2023-11-27 00:36:10,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3645320.0, ans=0.125 2023-11-27 00:36:11,998 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.15 vs. limit=22.5 2023-11-27 00:36:16,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3645386.6666666665, ans=0.0 2023-11-27 00:36:29,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3645453.3333333335, ans=0.125 2023-11-27 00:36:36,194 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5750, loss[loss=0.057, simple_loss=0.07084, pruned_loss=0.01147, audio_tagging_loss=0.01011, over 15394.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09039, pruned_loss=0.01229, audio_tagging_loss=0.008608, over 3051542.95 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:36:45,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3645520.0, ans=0.125 2023-11-27 00:36:53,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3645586.6666666665, ans=0.125 2023-11-27 00:36:59,305 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546850 2023-11-27 00:37:12,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3645720.0, ans=0.0 2023-11-27 00:37:12,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3645720.0, ans=0.0 2023-11-27 00:37:32,702 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5800, loss[loss=0.07272, simple_loss=0.1012, pruned_loss=0.01346, audio_tagging_loss=0.008669, over 14800.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08966, pruned_loss=0.01202, audio_tagging_loss=0.008495, over 3042370.22 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:37:37,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3645853.3333333335, ans=0.0 2023-11-27 00:37:42,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.951e+01 9.661e+01 1.044e+02 1.253e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 00:37:48,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3645920.0, ans=0.125 2023-11-27 00:37:55,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546900 2023-11-27 00:38:00,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3645986.6666666665, ans=0.125 2023-11-27 00:38:03,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3645986.6666666665, ans=0.1 2023-11-27 00:38:23,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3646120.0, ans=0.2 2023-11-27 00:38:29,076 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5850, loss[loss=0.07186, simple_loss=0.0956, pruned_loss=0.01573, audio_tagging_loss=0.008333, over 14087.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08911, pruned_loss=0.01195, audio_tagging_loss=0.008528, over 3041440.92 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:38:30,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3646186.6666666665, ans=0.1 2023-11-27 00:38:31,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3646186.6666666665, ans=0.0 2023-11-27 00:38:35,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3646186.6666666665, ans=0.1 2023-11-27 00:38:50,962 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546950 2023-11-27 00:38:53,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3646320.0, ans=0.0 2023-11-27 00:39:01,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3646386.6666666665, ans=0.125 2023-11-27 00:39:11,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3646386.6666666665, ans=0.0 2023-11-27 00:39:24,508 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5900, loss[loss=0.05807, simple_loss=0.08043, pruned_loss=0.008354, audio_tagging_loss=0.009498, over 14291.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08863, pruned_loss=0.01191, audio_tagging_loss=0.008573, over 3044331.26 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:39:24,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3646520.0, ans=0.09899494936611666 2023-11-27 00:39:34,482 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.740e+01 9.357e+01 9.859e+01 1.378e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 00:39:46,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2023-11-27 00:39:47,273 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547000 2023-11-27 00:39:51,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3646653.3333333335, ans=0.0 2023-11-27 00:39:52,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-11-27 00:39:54,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3646653.3333333335, ans=0.125 2023-11-27 00:40:17,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3646786.6666666665, ans=0.125 2023-11-27 00:40:20,827 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5950, loss[loss=0.06842, simple_loss=0.09451, pruned_loss=0.01346, audio_tagging_loss=0.007706, over 15980.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08854, pruned_loss=0.01184, audio_tagging_loss=0.00854, over 3053917.42 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:40:31,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3646920.0, ans=0.0 2023-11-27 00:40:40,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3646920.0, ans=0.125 2023-11-27 00:40:43,312 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547050 2023-11-27 00:40:49,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-27 00:41:16,152 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6000, loss[loss=0.06107, simple_loss=0.08039, pruned_loss=0.009654, audio_tagging_loss=0.01122, over 14738.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08809, pruned_loss=0.01183, audio_tagging_loss=0.008612, over 3047845.65 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:41:16,153 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 00:41:48,446 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05759, simple_loss=0.05057, pruned_loss=0.005367, audio_tagging_loss=0.02694, over 4681554.00 frames. 2023-11-27 00:41:48,447 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 00:41:58,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.712e+01 9.506e+01 1.018e+02 1.169e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-27 00:42:09,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3647320.0, ans=0.125 2023-11-27 00:42:10,641 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547100 2023-11-27 00:42:19,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3647320.0, ans=0.125 2023-11-27 00:42:27,961 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:42:44,317 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6050, loss[loss=0.06535, simple_loss=0.09144, pruned_loss=0.01302, audio_tagging_loss=0.00661, over 15260.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08996, pruned_loss=0.01197, audio_tagging_loss=0.008455, over 3049200.79 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:42:47,666 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:43:06,125 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547150 2023-11-27 00:43:15,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3647720.0, ans=0.125 2023-11-27 00:43:33,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3647786.6666666665, ans=0.125 2023-11-27 00:43:40,360 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6100, loss[loss=0.08374, simple_loss=0.1101, pruned_loss=0.01977, audio_tagging_loss=0.008924, over 15932.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0902, pruned_loss=0.01203, audio_tagging_loss=0.008505, over 3049128.61 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:43:44,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.31 vs. limit=10.0 2023-11-27 00:43:49,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 8.942e+01 9.763e+01 1.039e+02 1.274e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 00:43:50,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3647920.0, ans=0.125 2023-11-27 00:43:53,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3647920.0, ans=0.125 2023-11-27 00:43:58,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3647920.0, ans=0.125 2023-11-27 00:44:02,081 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547200 2023-11-27 00:44:05,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3647986.6666666665, ans=0.125 2023-11-27 00:44:05,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3647986.6666666665, ans=0.2 2023-11-27 00:44:17,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3648053.3333333335, ans=0.2 2023-11-27 00:44:17,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2023-11-27 00:44:35,854 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6150, loss[loss=0.06931, simple_loss=0.09214, pruned_loss=0.0153, audio_tagging_loss=0.007941, over 15682.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09019, pruned_loss=0.01213, audio_tagging_loss=0.008485, over 3046488.78 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:44:40,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3648186.6666666665, ans=0.1 2023-11-27 00:44:52,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3648253.3333333335, ans=0.0 2023-11-27 00:44:58,678 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547250 2023-11-27 00:45:07,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3648320.0, ans=0.125 2023-11-27 00:45:18,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3648386.6666666665, ans=0.0 2023-11-27 00:45:23,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3648453.3333333335, ans=0.2 2023-11-27 00:45:23,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3648453.3333333335, ans=0.1 2023-11-27 00:45:31,494 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6200, loss[loss=0.06825, simple_loss=0.09736, pruned_loss=0.01246, audio_tagging_loss=0.007106, over 15907.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08925, pruned_loss=0.01201, audio_tagging_loss=0.008548, over 3044671.10 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:45:36,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3648520.0, ans=0.125 2023-11-27 00:45:43,673 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.925e+01 9.447e+01 1.055e+02 1.440e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 00:45:47,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3648586.6666666665, ans=0.05 2023-11-27 00:45:54,305 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547300 2023-11-27 00:45:56,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3648653.3333333335, ans=0.2 2023-11-27 00:45:56,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3648653.3333333335, ans=0.1 2023-11-27 00:45:59,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3648653.3333333335, ans=0.1 2023-11-27 00:46:28,166 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6250, loss[loss=0.07591, simple_loss=0.1162, pruned_loss=0.0112, audio_tagging_loss=0.006607, over 15026.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08963, pruned_loss=0.01212, audio_tagging_loss=0.008707, over 3043134.35 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:46:49,452 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547350 2023-11-27 00:46:54,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3648986.6666666665, ans=0.2 2023-11-27 00:46:59,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3648986.6666666665, ans=0.09899494936611666 2023-11-27 00:47:18,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3649120.0, ans=0.125 2023-11-27 00:47:22,753 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6300, loss[loss=0.0605, simple_loss=0.08732, pruned_loss=0.009528, audio_tagging_loss=0.007314, over 14821.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08987, pruned_loss=0.01231, audio_tagging_loss=0.008796, over 3042143.06 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:47:33,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.827e+01 9.482e+01 1.035e+02 1.564e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 00:47:45,069 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547400 2023-11-27 00:47:45,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3649320.0, ans=0.125 2023-11-27 00:48:04,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3649386.6666666665, ans=0.0 2023-11-27 00:48:08,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3649453.3333333335, ans=0.125 2023-11-27 00:48:18,666 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6350, loss[loss=0.05949, simple_loss=0.07856, pruned_loss=0.01014, audio_tagging_loss=0.01007, over 15549.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09091, pruned_loss=0.01253, audio_tagging_loss=0.008765, over 3044278.63 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:48:24,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3649520.0, ans=0.0 2023-11-27 00:48:41,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547450 2023-11-27 00:48:46,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3649653.3333333335, ans=0.125 2023-11-27 00:48:51,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3649720.0, ans=0.07 2023-11-27 00:48:54,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3649720.0, ans=0.125 2023-11-27 00:48:54,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=22.5 2023-11-27 00:49:07,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3649786.6666666665, ans=0.2 2023-11-27 00:49:09,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649786.6666666665, ans=0.1 2023-11-27 00:49:11,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3649786.6666666665, ans=0.125 2023-11-27 00:49:14,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2023-11-27 00:49:15,277 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6400, loss[loss=0.05554, simple_loss=0.0674, pruned_loss=0.009815, audio_tagging_loss=0.01202, over 14140.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08999, pruned_loss=0.01231, audio_tagging_loss=0.008938, over 3046264.53 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:49:26,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.880e+01 9.472e+01 1.045e+02 1.391e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 00:49:37,200 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547500 2023-11-27 00:49:49,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3650053.3333333335, ans=0.0 2023-11-27 00:49:49,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.77 vs. limit=10.0 2023-11-27 00:49:53,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3650053.3333333335, ans=0.0 2023-11-27 00:49:54,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3650053.3333333335, ans=0.2 2023-11-27 00:49:59,342 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2023-11-27 00:50:02,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3650120.0, ans=0.125 2023-11-27 00:50:10,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3650186.6666666665, ans=0.125 2023-11-27 00:50:11,005 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6450, loss[loss=0.0693, simple_loss=0.09677, pruned_loss=0.01299, audio_tagging_loss=0.007925, over 14758.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09068, pruned_loss=0.01236, audio_tagging_loss=0.008933, over 3045512.68 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:50:14,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3650186.6666666665, ans=0.0 2023-11-27 00:50:33,258 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547550 2023-11-27 00:50:52,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3650386.6666666665, ans=0.1 2023-11-27 00:50:54,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3650453.3333333335, ans=0.0 2023-11-27 00:50:55,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-11-27 00:51:03,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2023-11-27 00:51:05,945 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6500, loss[loss=0.05251, simple_loss=0.06914, pruned_loss=0.01021, audio_tagging_loss=0.007738, over 15032.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08946, pruned_loss=0.0121, audio_tagging_loss=0.008871, over 3044135.01 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:51:15,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3650520.0, ans=0.0 2023-11-27 00:51:17,704 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.951e+01 9.386e+01 1.000e+02 1.193e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-27 00:51:23,787 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:51:26,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3650586.6666666665, ans=0.125 2023-11-27 00:51:29,054 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547600 2023-11-27 00:52:02,832 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6550, loss[loss=0.04967, simple_loss=0.07286, pruned_loss=0.005875, audio_tagging_loss=0.007359, over 15564.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09069, pruned_loss=0.01229, audio_tagging_loss=0.008675, over 3057267.98 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:52:25,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547650 2023-11-27 00:52:40,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3651053.3333333335, ans=0.125 2023-11-27 00:52:42,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3651053.3333333335, ans=0.2 2023-11-27 00:52:56,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-27 00:52:58,296 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6600, loss[loss=0.07084, simple_loss=0.1012, pruned_loss=0.01286, audio_tagging_loss=0.007375, over 14686.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09113, pruned_loss=0.01236, audio_tagging_loss=0.008537, over 3051453.98 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:53:09,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.821e+01 9.435e+01 1.031e+02 1.384e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 00:53:13,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3651253.3333333335, ans=0.125 2023-11-27 00:53:21,075 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547700 2023-11-27 00:53:23,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-27 00:53:37,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3651386.6666666665, ans=0.125 2023-11-27 00:53:49,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3651453.3333333335, ans=0.2 2023-11-27 00:53:54,136 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6650, loss[loss=0.06778, simple_loss=0.09691, pruned_loss=0.009196, audio_tagging_loss=0.01013, over 15887.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09081, pruned_loss=0.01246, audio_tagging_loss=0.008541, over 3054842.32 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:53:55,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3651520.0, ans=0.125 2023-11-27 00:54:06,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3651586.6666666665, ans=0.1 2023-11-27 00:54:17,095 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547750 2023-11-27 00:54:25,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-27 00:54:28,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2023-11-27 00:54:29,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=22.5 2023-11-27 00:54:48,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3651786.6666666665, ans=0.05 2023-11-27 00:54:50,313 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6700, loss[loss=0.05943, simple_loss=0.0916, pruned_loss=0.007772, audio_tagging_loss=0.005857, over 14963.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09092, pruned_loss=0.01236, audio_tagging_loss=0.008576, over 3051468.92 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:54:53,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3651853.3333333335, ans=0.0 2023-11-27 00:55:01,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.865e+01 9.450e+01 1.017e+02 1.235e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 00:55:08,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651920.0, ans=0.1 2023-11-27 00:55:09,633 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2023-11-27 00:55:12,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547800 2023-11-27 00:55:28,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652053.3333333335, ans=0.1 2023-11-27 00:55:30,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3652053.3333333335, ans=0.1 2023-11-27 00:55:38,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3652120.0, ans=0.125 2023-11-27 00:55:39,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3652120.0, ans=0.09899494936611666 2023-11-27 00:55:46,481 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6750, loss[loss=0.05811, simple_loss=0.08066, pruned_loss=0.00951, audio_tagging_loss=0.008265, over 15898.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08926, pruned_loss=0.012, audio_tagging_loss=0.008642, over 3038126.62 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:56:09,352 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547850 2023-11-27 00:56:23,945 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-27 00:56:41,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3652520.0, ans=0.0 2023-11-27 00:56:41,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2023-11-27 00:56:42,051 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6800, loss[loss=0.07394, simple_loss=0.1013, pruned_loss=0.01505, audio_tagging_loss=0.008229, over 14501.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08938, pruned_loss=0.01193, audio_tagging_loss=0.008571, over 3039724.17 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:56:53,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.866e+01 8.978e+01 9.815e+01 1.051e+02 1.384e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 00:57:01,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.11 vs. limit=10.0 2023-11-27 00:57:04,897 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547900 2023-11-27 00:57:08,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=8.0 2023-11-27 00:57:38,342 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6850, loss[loss=0.07732, simple_loss=0.104, pruned_loss=0.01675, audio_tagging_loss=0.008569, over 15113.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08991, pruned_loss=0.01205, audio_tagging_loss=0.008482, over 3040993.39 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:57:47,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3652853.3333333335, ans=0.125 2023-11-27 00:57:56,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-11-27 00:58:00,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547950 2023-11-27 00:58:15,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3653053.3333333335, ans=0.2 2023-11-27 00:58:34,346 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6900, loss[loss=0.07754, simple_loss=0.1004, pruned_loss=0.01815, audio_tagging_loss=0.009183, over 15579.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09017, pruned_loss=0.01215, audio_tagging_loss=0.008391, over 3041035.83 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:58:36,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3653186.6666666665, ans=0.125 2023-11-27 00:58:43,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3653186.6666666665, ans=0.2 2023-11-27 00:58:45,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.904e+01 9.598e+01 1.032e+02 1.208e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 00:58:47,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3653253.3333333335, ans=0.125 2023-11-27 00:58:49,533 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:58:56,310 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548000 2023-11-27 00:59:06,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3653320.0, ans=0.1 2023-11-27 00:59:10,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3653386.6666666665, ans=0.0 2023-11-27 00:59:19,634 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:59:21,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3653453.3333333335, ans=0.125 2023-11-27 00:59:31,362 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6950, loss[loss=0.08322, simple_loss=0.1131, pruned_loss=0.01885, audio_tagging_loss=0.007837, over 14934.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09051, pruned_loss=0.01214, audio_tagging_loss=0.008481, over 3039565.03 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:59:33,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2023-11-27 00:59:48,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3653586.6666666665, ans=0.1 2023-11-27 00:59:54,818 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548050 2023-11-27 00:59:55,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3653653.3333333335, ans=0.125 2023-11-27 01:00:01,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3653653.3333333335, ans=0.05 2023-11-27 01:00:27,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3653853.3333333335, ans=0.2 2023-11-27 01:00:27,983 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7000, loss[loss=0.06009, simple_loss=0.07937, pruned_loss=0.01044, audio_tagging_loss=0.009966, over 14770.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08998, pruned_loss=0.01211, audio_tagging_loss=0.008476, over 3040906.51 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 01:00:33,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3653853.3333333335, ans=0.0 2023-11-27 01:00:39,178 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.912e+01 9.354e+01 1.017e+02 1.441e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 01:00:43,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3653920.0, ans=0.0 2023-11-27 01:00:49,660 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548100 2023-11-27 01:00:50,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2023-11-27 01:01:04,017 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:01:23,280 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7050, loss[loss=0.05344, simple_loss=0.06756, pruned_loss=0.009991, audio_tagging_loss=0.009673, over 15469.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08955, pruned_loss=0.01232, audio_tagging_loss=0.008581, over 3042507.33 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:01:32,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3654186.6666666665, ans=0.2 2023-11-27 01:01:36,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3654253.3333333335, ans=0.125 2023-11-27 01:01:43,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3654320.0, ans=0.1 2023-11-27 01:01:44,579 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548150 2023-11-27 01:01:52,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3654320.0, ans=0.04949747468305833 2023-11-27 01:01:56,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2023-11-27 01:02:02,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3654386.6666666665, ans=0.0 2023-11-27 01:02:06,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3654453.3333333335, ans=0.125 2023-11-27 01:02:06,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3654453.3333333335, ans=0.0 2023-11-27 01:02:18,131 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7100, loss[loss=0.05744, simple_loss=0.07636, pruned_loss=0.01128, audio_tagging_loss=0.007981, over 14855.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08937, pruned_loss=0.0122, audio_tagging_loss=0.008636, over 3044959.18 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:02:27,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=22.5 2023-11-27 01:02:30,244 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.909e+01 9.590e+01 1.018e+02 1.394e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 01:02:30,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3654586.6666666665, ans=0.0 2023-11-27 01:02:33,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3654586.6666666665, ans=0.2 2023-11-27 01:02:34,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-27 01:02:34,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3654586.6666666665, ans=0.0 2023-11-27 01:02:35,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-27 01:02:38,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-27 01:02:40,402 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548200 2023-11-27 01:02:44,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3654653.3333333335, ans=0.125 2023-11-27 01:02:49,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3654653.3333333335, ans=0.1 2023-11-27 01:02:52,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3654720.0, ans=0.125 2023-11-27 01:02:59,619 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2023-11-27 01:03:01,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3654786.6666666665, ans=0.1 2023-11-27 01:03:11,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3654786.6666666665, ans=0.025 2023-11-27 01:03:13,931 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7150, loss[loss=0.06461, simple_loss=0.08758, pruned_loss=0.01057, audio_tagging_loss=0.01025, over 16087.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09006, pruned_loss=0.01227, audio_tagging_loss=0.008676, over 3050530.41 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:03:34,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3654920.0, ans=0.125 2023-11-27 01:03:35,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2023-11-27 01:03:36,475 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548250 2023-11-27 01:03:37,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3654986.6666666665, ans=0.1 2023-11-27 01:03:37,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3654986.6666666665, ans=0.0 2023-11-27 01:03:44,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2023-11-27 01:04:05,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655120.0, ans=0.1 2023-11-27 01:04:09,605 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7200, loss[loss=0.07159, simple_loss=0.103, pruned_loss=0.01103, audio_tagging_loss=0.009059, over 15400.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08999, pruned_loss=0.01217, audio_tagging_loss=0.008724, over 3053986.06 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:04:09,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3655186.6666666665, ans=0.125 2023-11-27 01:04:17,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655186.6666666665, ans=0.1 2023-11-27 01:04:19,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655253.3333333335, ans=0.1 2023-11-27 01:04:22,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.112e+01 9.564e+01 1.040e+02 1.454e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 01:04:26,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-27 01:04:31,028 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548300 2023-11-27 01:04:38,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3655320.0, ans=0.125 2023-11-27 01:04:43,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3655386.6666666665, ans=0.125 2023-11-27 01:04:46,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2023-11-27 01:05:04,728 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7250, loss[loss=0.06251, simple_loss=0.0766, pruned_loss=0.01416, audio_tagging_loss=0.01005, over 14046.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09049, pruned_loss=0.01229, audio_tagging_loss=0.008737, over 3050011.18 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:05:09,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3655520.0, ans=0.125 2023-11-27 01:05:27,666 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548350 2023-11-27 01:05:31,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3655653.3333333335, ans=0.2 2023-11-27 01:05:37,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3655720.0, ans=0.0 2023-11-27 01:05:49,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3655786.6666666665, ans=0.125 2023-11-27 01:05:59,863 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7300, loss[loss=0.05954, simple_loss=0.0814, pruned_loss=0.01192, audio_tagging_loss=0.006927, over 15097.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09026, pruned_loss=0.01209, audio_tagging_loss=0.008645, over 3048784.44 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:06:14,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.978e+01 9.664e+01 1.039e+02 1.460e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 01:06:15,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-27 01:06:17,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3655920.0, ans=0.0 2023-11-27 01:06:23,172 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548400 2023-11-27 01:06:39,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3656053.3333333335, ans=0.05 2023-11-27 01:06:51,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3656120.0, ans=0.125 2023-11-27 01:06:57,552 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7350, loss[loss=0.06592, simple_loss=0.08841, pruned_loss=0.01181, audio_tagging_loss=0.009902, over 16458.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09006, pruned_loss=0.01218, audio_tagging_loss=0.008513, over 3045508.99 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:07:01,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3656186.6666666665, ans=0.125 2023-11-27 01:07:18,774 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548450 2023-11-27 01:07:52,348 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7400, loss[loss=0.05359, simple_loss=0.0793, pruned_loss=0.006921, audio_tagging_loss=0.007023, over 15324.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08932, pruned_loss=0.01197, audio_tagging_loss=0.008492, over 3049448.70 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:07:56,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-27 01:07:58,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3656520.0, ans=0.125 2023-11-27 01:08:05,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.855e+01 9.450e+01 1.015e+02 1.303e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 01:08:14,685 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548500 2023-11-27 01:08:33,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2023-11-27 01:08:41,560 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2023-11-27 01:08:47,514 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7450, loss[loss=0.05787, simple_loss=0.07679, pruned_loss=0.01089, audio_tagging_loss=0.008586, over 15116.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.09014, pruned_loss=0.01211, audio_tagging_loss=0.008419, over 3052950.85 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:09:05,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3656920.0, ans=0.125 2023-11-27 01:09:10,531 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548550 2023-11-27 01:09:16,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3656986.6666666665, ans=0.125 2023-11-27 01:09:20,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3657053.3333333335, ans=0.125 2023-11-27 01:09:22,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3657053.3333333335, ans=0.025 2023-11-27 01:09:36,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3657120.0, ans=0.0 2023-11-27 01:09:36,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3657120.0, ans=0.125 2023-11-27 01:09:40,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3657120.0, ans=0.0 2023-11-27 01:09:43,442 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7500, loss[loss=0.06062, simple_loss=0.08677, pruned_loss=0.009903, audio_tagging_loss=0.007333, over 15224.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08932, pruned_loss=0.01198, audio_tagging_loss=0.008504, over 3057447.15 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:09:45,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2023-11-27 01:09:53,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=8.0 2023-11-27 01:09:57,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.963e+01 9.690e+01 1.036e+02 1.410e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-27 01:10:05,777 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548600 2023-11-27 01:10:15,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3657320.0, ans=0.05 2023-11-27 01:10:20,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3657386.6666666665, ans=0.125 2023-11-27 01:10:23,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-11-27 01:10:25,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3657386.6666666665, ans=0.025 2023-11-27 01:10:32,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3657453.3333333335, ans=0.125 2023-11-27 01:10:36,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3657453.3333333335, ans=0.2 2023-11-27 01:10:39,350 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7550, loss[loss=0.05827, simple_loss=0.07797, pruned_loss=0.01216, audio_tagging_loss=0.007119, over 15388.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.0904, pruned_loss=0.01227, audio_tagging_loss=0.008448, over 3058643.88 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:10:41,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3657520.0, ans=0.05 2023-11-27 01:10:41,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3657520.0, ans=0.125 2023-11-27 01:10:44,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2023-11-27 01:10:48,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3657520.0, ans=0.125 2023-11-27 01:10:56,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3657586.6666666665, ans=0.125 2023-11-27 01:11:01,656 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548650 2023-11-27 01:11:18,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3657720.0, ans=0.1 2023-11-27 01:11:34,186 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7600, loss[loss=0.08489, simple_loss=0.1127, pruned_loss=0.01787, audio_tagging_loss=0.01068, over 14691.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08974, pruned_loss=0.01216, audio_tagging_loss=0.008544, over 3054411.00 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:11:45,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.90 vs. limit=10.0 2023-11-27 01:11:46,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3657920.0, ans=0.07 2023-11-27 01:11:47,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.781e+01 9.560e+01 1.034e+02 1.331e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 01:11:49,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3657920.0, ans=0.125 2023-11-27 01:11:57,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548700 2023-11-27 01:11:57,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2023-11-27 01:12:06,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3657986.6666666665, ans=0.125 2023-11-27 01:12:12,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3658053.3333333335, ans=0.125 2023-11-27 01:12:18,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3658120.0, ans=0.125 2023-11-27 01:12:25,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-27 01:12:30,372 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7650, loss[loss=0.06096, simple_loss=0.09713, pruned_loss=0.00642, audio_tagging_loss=0.005969, over 15261.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08991, pruned_loss=0.01204, audio_tagging_loss=0.008535, over 3058830.90 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:12:46,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3658253.3333333335, ans=0.125 2023-11-27 01:12:50,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.24 vs. limit=10.0 2023-11-27 01:12:52,765 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548750 2023-11-27 01:12:55,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-27 01:13:05,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3658386.6666666665, ans=0.2 2023-11-27 01:13:08,396 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=15.0 2023-11-27 01:13:13,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2023-11-27 01:13:26,551 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7700, loss[loss=0.06005, simple_loss=0.06824, pruned_loss=0.01366, audio_tagging_loss=0.01228, over 14703.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09027, pruned_loss=0.0123, audio_tagging_loss=0.008503, over 3058912.94 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:13:30,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3658520.0, ans=0.2 2023-11-27 01:13:40,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.982e+01 9.750e+01 1.038e+02 1.363e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 01:13:48,769 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548800 2023-11-27 01:14:07,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3658720.0, ans=0.125 2023-11-27 01:14:21,534 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7750, loss[loss=0.04105, simple_loss=0.05679, pruned_loss=0.004632, audio_tagging_loss=0.008018, over 14937.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.0902, pruned_loss=0.0121, audio_tagging_loss=0.00853, over 3056908.39 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:14:25,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3658853.3333333335, ans=0.125 2023-11-27 01:14:44,232 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548850 2023-11-27 01:14:45,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3658986.6666666665, ans=0.1 2023-11-27 01:14:55,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3659053.3333333335, ans=0.125 2023-11-27 01:15:01,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3659053.3333333335, ans=0.2 2023-11-27 01:15:15,421 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:15:17,383 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7800, loss[loss=0.08091, simple_loss=0.1068, pruned_loss=0.01598, audio_tagging_loss=0.01152, over 14988.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08927, pruned_loss=0.01191, audio_tagging_loss=0.008634, over 3062436.22 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:15:17,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3659186.6666666665, ans=0.07 2023-11-27 01:15:24,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3659186.6666666665, ans=0.125 2023-11-27 01:15:26,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3659186.6666666665, ans=0.1 2023-11-27 01:15:31,036 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-27 01:15:31,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.098e+01 9.034e+01 9.648e+01 1.056e+02 1.237e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 01:15:33,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3659253.3333333335, ans=0.125 2023-11-27 01:15:36,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3659253.3333333335, ans=0.0 2023-11-27 01:15:39,536 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548900 2023-11-27 01:15:44,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3659320.0, ans=0.0 2023-11-27 01:16:12,944 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7850, loss[loss=0.05208, simple_loss=0.07325, pruned_loss=0.007784, audio_tagging_loss=0.00767, over 15683.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09011, pruned_loss=0.01206, audio_tagging_loss=0.008651, over 3060927.32 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:16:13,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3659520.0, ans=0.125 2023-11-27 01:16:16,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3659520.0, ans=0.0 2023-11-27 01:16:35,352 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548950 2023-11-27 01:16:50,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3659720.0, ans=0.2 2023-11-27 01:16:58,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3659786.6666666665, ans=0.0 2023-11-27 01:17:08,645 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7900, loss[loss=0.07657, simple_loss=0.09985, pruned_loss=0.01449, audio_tagging_loss=0.01215, over 15229.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.0901, pruned_loss=0.01219, audio_tagging_loss=0.008745, over 3055804.81 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:17:23,297 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 9.289e+01 9.929e+01 1.057e+02 1.408e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-27 01:17:31,352 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549000 2023-11-27 01:18:00,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3660120.0, ans=0.1 2023-11-27 01:18:04,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3660186.6666666665, ans=0.125 2023-11-27 01:18:04,845 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7950, loss[loss=0.06847, simple_loss=0.09295, pruned_loss=0.01079, audio_tagging_loss=0.01121, over 15108.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09006, pruned_loss=0.01218, audio_tagging_loss=0.008828, over 3053654.54 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:18:18,138 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:18:22,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3660253.3333333335, ans=0.125 2023-11-27 01:18:26,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549050 2023-11-27 01:18:26,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3660320.0, ans=0.125 2023-11-27 01:18:30,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3660320.0, ans=0.2 2023-11-27 01:18:30,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3660320.0, ans=0.125 2023-11-27 01:18:56,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3660453.3333333335, ans=0.0 2023-11-27 01:19:00,888 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8000, loss[loss=0.05421, simple_loss=0.07176, pruned_loss=0.009144, audio_tagging_loss=0.009185, over 14862.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08944, pruned_loss=0.01205, audio_tagging_loss=0.008943, over 3051104.68 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:19:09,674 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2023-11-27 01:19:12,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-27 01:19:14,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 9.017e+01 9.575e+01 1.027e+02 1.291e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 01:19:22,512 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549100 2023-11-27 01:19:25,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3660653.3333333335, ans=0.1 2023-11-27 01:19:33,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-27 01:19:35,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3660720.0, ans=0.04949747468305833 2023-11-27 01:19:52,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3660786.6666666665, ans=0.125 2023-11-27 01:19:55,666 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8050, loss[loss=0.08503, simple_loss=0.1206, pruned_loss=0.01862, audio_tagging_loss=0.006123, over 17147.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09033, pruned_loss=0.01224, audio_tagging_loss=0.008929, over 3049592.47 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:20:03,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3660853.3333333335, ans=0.125 2023-11-27 01:20:04,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-27 01:20:18,386 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549150 2023-11-27 01:20:46,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3661120.0, ans=0.2 2023-11-27 01:20:48,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3661120.0, ans=0.2 2023-11-27 01:20:51,913 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8100, loss[loss=0.08356, simple_loss=0.1137, pruned_loss=0.01864, audio_tagging_loss=0.008092, over 15075.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09006, pruned_loss=0.01221, audio_tagging_loss=0.00877, over 3051673.88 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:21:07,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.808e+01 9.534e+01 1.042e+02 1.593e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:21:07,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3661253.3333333335, ans=0.0 2023-11-27 01:21:12,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3661320.0, ans=0.2 2023-11-27 01:21:13,708 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549200 2023-11-27 01:21:20,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3661320.0, ans=0.125 2023-11-27 01:21:21,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3661320.0, ans=0.0 2023-11-27 01:21:25,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3661386.6666666665, ans=0.0 2023-11-27 01:21:47,907 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8150, loss[loss=0.04602, simple_loss=0.06204, pruned_loss=0.006205, audio_tagging_loss=0.008788, over 15356.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08972, pruned_loss=0.0122, audio_tagging_loss=0.008697, over 3049009.05 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:22:06,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3661586.6666666665, ans=0.04949747468305833 2023-11-27 01:22:07,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=12.0 2023-11-27 01:22:09,094 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549250 2023-11-27 01:22:25,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3661720.0, ans=0.1 2023-11-27 01:22:41,926 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:22:42,946 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8200, loss[loss=0.05021, simple_loss=0.06375, pruned_loss=0.006947, audio_tagging_loss=0.01139, over 14052.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09013, pruned_loss=0.01213, audio_tagging_loss=0.008589, over 3056077.45 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:22:58,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.840e+01 9.434e+01 1.030e+02 1.387e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 01:23:05,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549300 2023-11-27 01:23:29,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3662120.0, ans=0.125 2023-11-27 01:23:36,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=22.5 2023-11-27 01:23:38,470 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8250, loss[loss=0.07162, simple_loss=0.09306, pruned_loss=0.0174, audio_tagging_loss=0.007685, over 14558.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08985, pruned_loss=0.01206, audio_tagging_loss=0.008567, over 3059216.62 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:23:41,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3662186.6666666665, ans=0.1 2023-11-27 01:23:49,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3662253.3333333335, ans=0.125 2023-11-27 01:23:56,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3662253.3333333335, ans=0.125 2023-11-27 01:24:00,864 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549350 2023-11-27 01:24:20,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3662386.6666666665, ans=0.02 2023-11-27 01:24:20,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3662386.6666666665, ans=0.125 2023-11-27 01:24:34,719 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8300, loss[loss=0.08086, simple_loss=0.119, pruned_loss=0.01318, audio_tagging_loss=0.008188, over 15593.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09016, pruned_loss=0.01198, audio_tagging_loss=0.008488, over 3053157.23 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:24:41,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3662520.0, ans=10.0 2023-11-27 01:24:47,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3662586.6666666665, ans=0.125 2023-11-27 01:24:49,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 9.008e+01 9.718e+01 1.064e+02 1.333e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 01:24:49,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3662586.6666666665, ans=0.125 2023-11-27 01:24:50,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3662586.6666666665, ans=0.2 2023-11-27 01:24:54,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3662586.6666666665, ans=0.1 2023-11-27 01:24:56,123 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549400 2023-11-27 01:25:00,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3662653.3333333335, ans=0.0 2023-11-27 01:25:08,157 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:25:14,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2023-11-27 01:25:17,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3662720.0, ans=0.0 2023-11-27 01:25:29,694 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8350, loss[loss=0.06709, simple_loss=0.09024, pruned_loss=0.01249, audio_tagging_loss=0.009481, over 15103.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08971, pruned_loss=0.01178, audio_tagging_loss=0.008513, over 3055820.65 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:25:41,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3662920.0, ans=0.0 2023-11-27 01:25:52,448 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549450 2023-11-27 01:26:24,622 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8400, loss[loss=0.07134, simple_loss=0.08862, pruned_loss=0.01805, audio_tagging_loss=0.008977, over 14512.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08996, pruned_loss=0.0119, audio_tagging_loss=0.008535, over 3054173.53 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:26:31,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3663186.6666666665, ans=0.125 2023-11-27 01:26:42,698 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.598e+01 9.317e+01 1.002e+02 1.221e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 01:26:44,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3663253.3333333335, ans=10.0 2023-11-27 01:26:48,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549500 2023-11-27 01:26:52,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3663320.0, ans=0.0 2023-11-27 01:27:02,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3663386.6666666665, ans=0.0 2023-11-27 01:27:20,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3663520.0, ans=0.125 2023-11-27 01:27:21,411 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8450, loss[loss=0.05382, simple_loss=0.07582, pruned_loss=0.005801, audio_tagging_loss=0.01011, over 15105.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08927, pruned_loss=0.01192, audio_tagging_loss=0.008575, over 3051227.08 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:27:34,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3663586.6666666665, ans=0.1 2023-11-27 01:27:43,112 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549550 2023-11-27 01:28:01,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3663720.0, ans=0.0 2023-11-27 01:28:05,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3663786.6666666665, ans=0.0 2023-11-27 01:28:07,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3663786.6666666665, ans=0.125 2023-11-27 01:28:07,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3663786.6666666665, ans=0.125 2023-11-27 01:28:16,773 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8500, loss[loss=0.0728, simple_loss=0.09838, pruned_loss=0.01704, audio_tagging_loss=0.006578, over 14706.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08997, pruned_loss=0.0122, audio_tagging_loss=0.008486, over 3051751.19 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:28:32,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.917e+01 9.803e+01 1.059e+02 2.470e+02, threshold=1.961e+02, percent-clipped=1.0 2023-11-27 01:28:38,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3663986.6666666665, ans=0.125 2023-11-27 01:28:38,846 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549600 2023-11-27 01:28:56,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3664053.3333333335, ans=0.0 2023-11-27 01:29:05,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-27 01:29:06,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3664120.0, ans=0.125 2023-11-27 01:29:07,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3664120.0, ans=0.035 2023-11-27 01:29:11,610 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8550, loss[loss=0.04748, simple_loss=0.06761, pruned_loss=0.005759, audio_tagging_loss=0.007912, over 14711.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08969, pruned_loss=0.0122, audio_tagging_loss=0.008556, over 3046151.61 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:29:26,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3664253.3333333335, ans=0.125 2023-11-27 01:29:29,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3664253.3333333335, ans=0.125 2023-11-27 01:29:35,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549650 2023-11-27 01:29:37,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3664320.0, ans=0.125 2023-11-27 01:29:43,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3664320.0, ans=0.125 2023-11-27 01:29:55,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3664453.3333333335, ans=0.5 2023-11-27 01:30:00,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3664453.3333333335, ans=0.05 2023-11-27 01:30:01,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-11-27 01:30:02,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3664453.3333333335, ans=0.0 2023-11-27 01:30:07,866 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8600, loss[loss=0.06393, simple_loss=0.09343, pruned_loss=0.008336, audio_tagging_loss=0.008878, over 14758.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08977, pruned_loss=0.01221, audio_tagging_loss=0.00855, over 3046872.03 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:30:11,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3664520.0, ans=0.05 2023-11-27 01:30:17,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2023-11-27 01:30:24,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.820e+01 9.467e+01 9.988e+01 1.186e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 01:30:27,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3664586.6666666665, ans=0.04949747468305833 2023-11-27 01:30:29,571 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549700 2023-11-27 01:30:29,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3664653.3333333335, ans=0.125 2023-11-27 01:30:30,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-27 01:30:43,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3664720.0, ans=0.0 2023-11-27 01:30:43,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3664720.0, ans=0.125 2023-11-27 01:30:44,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3664720.0, ans=15.0 2023-11-27 01:30:46,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3664720.0, ans=0.0 2023-11-27 01:31:01,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3664786.6666666665, ans=0.125 2023-11-27 01:31:03,576 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8650, loss[loss=0.06463, simple_loss=0.08532, pruned_loss=0.01323, audio_tagging_loss=0.008738, over 15027.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09137, pruned_loss=0.01251, audio_tagging_loss=0.008566, over 3048461.68 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:31:09,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-27 01:31:13,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3664920.0, ans=0.125 2023-11-27 01:31:19,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3664920.0, ans=0.125 2023-11-27 01:31:26,025 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549750 2023-11-27 01:31:32,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3664986.6666666665, ans=0.0 2023-11-27 01:31:33,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3664986.6666666665, ans=0.125 2023-11-27 01:31:50,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3665120.0, ans=0.1 2023-11-27 01:31:58,592 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8700, loss[loss=0.04446, simple_loss=0.0604, pruned_loss=0.006408, audio_tagging_loss=0.00785, over 13856.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09078, pruned_loss=0.01243, audio_tagging_loss=0.008633, over 3050716.79 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:32:15,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.944e+01 9.069e+01 9.762e+01 1.053e+02 1.470e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 01:32:16,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3665253.3333333335, ans=0.07 2023-11-27 01:32:19,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3665253.3333333335, ans=0.1 2023-11-27 01:32:20,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3665320.0, ans=0.04949747468305833 2023-11-27 01:32:21,541 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549800 2023-11-27 01:32:38,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3665386.6666666665, ans=0.2 2023-11-27 01:32:42,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3665453.3333333335, ans=0.125 2023-11-27 01:32:46,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3665453.3333333335, ans=0.0 2023-11-27 01:32:55,290 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8750, loss[loss=0.06749, simple_loss=0.09225, pruned_loss=0.01103, audio_tagging_loss=0.01033, over 14437.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09149, pruned_loss=0.01261, audio_tagging_loss=0.008792, over 3053848.89 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:32:58,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3665520.0, ans=0.125 2023-11-27 01:33:11,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-27 01:33:17,390 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549850 2023-11-27 01:33:50,721 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8800, loss[loss=0.04439, simple_loss=0.04762, pruned_loss=0.008136, audio_tagging_loss=0.01244, over 15564.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.091, pruned_loss=0.0126, audio_tagging_loss=0.008874, over 3052815.51 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:34:01,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3665920.0, ans=0.125 2023-11-27 01:34:02,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3665920.0, ans=0.0 2023-11-27 01:34:08,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.927e+01 8.987e+01 9.532e+01 1.025e+02 1.979e+02, threshold=1.906e+02, percent-clipped=1.0 2023-11-27 01:34:13,062 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549900 2023-11-27 01:34:16,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.58 vs. limit=10.0 2023-11-27 01:34:32,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3666053.3333333335, ans=0.035 2023-11-27 01:34:32,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3666053.3333333335, ans=0.125 2023-11-27 01:34:32,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3666053.3333333335, ans=0.125 2023-11-27 01:34:36,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3666120.0, ans=0.2 2023-11-27 01:34:46,289 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8850, loss[loss=0.0512, simple_loss=0.07057, pruned_loss=0.005736, audio_tagging_loss=0.01018, over 16089.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09041, pruned_loss=0.01231, audio_tagging_loss=0.00881, over 3053266.77 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:34:55,324 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:35:01,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3666253.3333333335, ans=0.0 2023-11-27 01:35:08,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3666320.0, ans=0.125 2023-11-27 01:35:09,299 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549950 2023-11-27 01:35:27,351 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-27 01:35:37,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3666453.3333333335, ans=0.125 2023-11-27 01:35:37,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-11-27 01:35:39,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3666453.3333333335, ans=0.125 2023-11-27 01:35:42,758 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8900, loss[loss=0.08431, simple_loss=0.1202, pruned_loss=0.01771, audio_tagging_loss=0.00653, over 16495.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09145, pruned_loss=0.01245, audio_tagging_loss=0.008721, over 3060820.66 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:36:00,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 8.952e+01 9.534e+01 1.026e+02 1.525e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:36:01,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3666586.6666666665, ans=0.07 2023-11-27 01:36:04,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3666653.3333333335, ans=0.125 2023-11-27 01:36:05,257 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550000 2023-11-27 01:36:14,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2023-11-27 01:36:25,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666720.0, ans=0.1 2023-11-27 01:36:27,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3666786.6666666665, ans=0.125 2023-11-27 01:36:33,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3666786.6666666665, ans=0.125 2023-11-27 01:36:38,836 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8950, loss[loss=0.06781, simple_loss=0.09683, pruned_loss=0.0129, audio_tagging_loss=0.00649, over 15654.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09065, pruned_loss=0.01228, audio_tagging_loss=0.008664, over 3050780.70 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:36:43,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2023-11-27 01:36:46,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3666853.3333333335, ans=0.125 2023-11-27 01:36:51,255 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:37:00,462 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550050 2023-11-27 01:37:16,118 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:37:34,274 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9000, loss[loss=0.05108, simple_loss=0.06914, pruned_loss=0.007441, audio_tagging_loss=0.009067, over 15233.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09058, pruned_loss=0.01225, audio_tagging_loss=0.008647, over 3051371.51 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:37:34,275 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 01:38:07,107 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05879, simple_loss=0.05049, pruned_loss=0.005306, audio_tagging_loss=0.02824, over 4681554.00 frames. 2023-11-27 01:38:07,108 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 01:38:09,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3667186.6666666665, ans=0.2 2023-11-27 01:38:13,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3667186.6666666665, ans=0.125 2023-11-27 01:38:19,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-27 01:38:20,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-27 01:38:24,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-27 01:38:25,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.928e+01 9.533e+01 1.025e+02 1.320e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:38:27,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3667320.0, ans=0.1 2023-11-27 01:38:29,402 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550100 2023-11-27 01:38:55,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3667453.3333333335, ans=0.0 2023-11-27 01:39:02,708 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9050, loss[loss=0.09686, simple_loss=0.1409, pruned_loss=0.02197, audio_tagging_loss=0.004442, over 15551.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09085, pruned_loss=0.01217, audio_tagging_loss=0.00857, over 3052376.86 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 4.0 2023-11-27 01:39:23,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2023-11-27 01:39:25,130 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550150 2023-11-27 01:39:58,485 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9100, loss[loss=0.05841, simple_loss=0.08546, pruned_loss=0.008708, audio_tagging_loss=0.006973, over 14427.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09032, pruned_loss=0.01212, audio_tagging_loss=0.00853, over 3051063.83 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:39:58,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3667853.3333333335, ans=0.2 2023-11-27 01:40:19,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 9.136e+01 9.567e+01 1.016e+02 1.322e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 01:40:21,400 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550200 2023-11-27 01:40:38,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3668053.3333333335, ans=0.2 2023-11-27 01:40:42,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3668120.0, ans=0.2 2023-11-27 01:40:53,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3668186.6666666665, ans=0.0 2023-11-27 01:40:54,722 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9150, loss[loss=0.05966, simple_loss=0.07259, pruned_loss=0.01165, audio_tagging_loss=0.01172, over 15094.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08921, pruned_loss=0.01181, audio_tagging_loss=0.00863, over 3057834.56 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:41:00,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3668186.6666666665, ans=0.0 2023-11-27 01:41:13,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3668253.3333333335, ans=0.125 2023-11-27 01:41:16,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550250 2023-11-27 01:41:19,853 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:41:32,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3668386.6666666665, ans=0.125 2023-11-27 01:41:39,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3668453.3333333335, ans=0.1 2023-11-27 01:41:50,293 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9200, loss[loss=0.07026, simple_loss=0.09733, pruned_loss=0.01157, audio_tagging_loss=0.01002, over 16293.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08873, pruned_loss=0.01178, audio_tagging_loss=0.008651, over 3056056.98 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:42:09,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.971e+01 9.683e+01 1.056e+02 2.334e+02, threshold=1.937e+02, percent-clipped=1.0 2023-11-27 01:42:12,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550300 2023-11-27 01:42:13,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3668653.3333333335, ans=0.125 2023-11-27 01:42:40,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3668786.6666666665, ans=0.0 2023-11-27 01:42:44,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-27 01:42:45,357 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9250, loss[loss=0.0634, simple_loss=0.0931, pruned_loss=0.007399, audio_tagging_loss=0.009457, over 15214.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08983, pruned_loss=0.01186, audio_tagging_loss=0.008457, over 3057409.91 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:42:48,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3668853.3333333335, ans=0.0 2023-11-27 01:43:06,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3668920.0, ans=10.0 2023-11-27 01:43:08,925 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550350 2023-11-27 01:43:10,119 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:43:41,746 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9300, loss[loss=0.07064, simple_loss=0.09374, pruned_loss=0.01679, audio_tagging_loss=0.006983, over 14937.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08924, pruned_loss=0.01179, audio_tagging_loss=0.008503, over 3056244.41 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:43:42,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3669186.6666666665, ans=0.125 2023-11-27 01:44:01,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 8.933e+01 9.435e+01 1.011e+02 1.310e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 01:44:04,054 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550400 2023-11-27 01:44:04,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3669320.0, ans=0.1 2023-11-27 01:44:05,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2023-11-27 01:44:11,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3669320.0, ans=0.125 2023-11-27 01:44:12,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3669320.0, ans=0.2 2023-11-27 01:44:15,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2023-11-27 01:44:17,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3669386.6666666665, ans=0.125 2023-11-27 01:44:34,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3669453.3333333335, ans=0.0 2023-11-27 01:44:36,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3669453.3333333335, ans=10.0 2023-11-27 01:44:38,040 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9350, loss[loss=0.06437, simple_loss=0.08641, pruned_loss=0.01355, audio_tagging_loss=0.007618, over 15255.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08908, pruned_loss=0.01204, audio_tagging_loss=0.008587, over 3052347.90 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:44:57,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3669586.6666666665, ans=0.125 2023-11-27 01:44:59,442 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550450 2023-11-27 01:45:03,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-11-27 01:45:10,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3669720.0, ans=0.125 2023-11-27 01:45:14,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3669720.0, ans=0.1 2023-11-27 01:45:26,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3669786.6666666665, ans=0.0 2023-11-27 01:45:33,196 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9400, loss[loss=0.07313, simple_loss=0.104, pruned_loss=0.01299, audio_tagging_loss=0.008145, over 15619.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.0896, pruned_loss=0.01206, audio_tagging_loss=0.008637, over 3056760.46 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:45:35,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3669853.3333333335, ans=0.125 2023-11-27 01:45:35,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2023-11-27 01:45:40,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3669853.3333333335, ans=0.125 2023-11-27 01:45:50,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3669920.0, ans=0.2 2023-11-27 01:45:54,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 8.852e+01 9.637e+01 1.052e+02 1.350e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 01:45:56,319 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550500 2023-11-27 01:46:01,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3669986.6666666665, ans=0.0 2023-11-27 01:46:08,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3670053.3333333335, ans=0.1 2023-11-27 01:46:10,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3670053.3333333335, ans=0.125 2023-11-27 01:46:24,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3670120.0, ans=0.0 2023-11-27 01:46:25,297 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:46:29,033 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9450, loss[loss=0.04441, simple_loss=0.05326, pruned_loss=0.007321, audio_tagging_loss=0.01046, over 16304.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08907, pruned_loss=0.01201, audio_tagging_loss=0.008727, over 3058292.92 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:46:34,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3670186.6666666665, ans=0.0 2023-11-27 01:46:42,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3670253.3333333335, ans=0.0 2023-11-27 01:46:50,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3670253.3333333335, ans=0.125 2023-11-27 01:46:51,977 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550550 2023-11-27 01:47:20,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3670453.3333333335, ans=0.0 2023-11-27 01:47:22,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3670453.3333333335, ans=0.0 2023-11-27 01:47:25,716 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9500, loss[loss=0.04318, simple_loss=0.05797, pruned_loss=0.003481, audio_tagging_loss=0.01072, over 14629.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08871, pruned_loss=0.01184, audio_tagging_loss=0.008879, over 3055628.63 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:47:30,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3670520.0, ans=0.1 2023-11-27 01:47:36,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3670586.6666666665, ans=0.125 2023-11-27 01:47:38,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3670586.6666666665, ans=0.125 2023-11-27 01:47:44,247 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2023-11-27 01:47:44,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.026e+01 9.482e+01 1.013e+02 1.263e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 01:47:46,835 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550600 2023-11-27 01:47:51,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.52 vs. limit=22.5 2023-11-27 01:48:13,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3670786.6666666665, ans=0.125 2023-11-27 01:48:17,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3670786.6666666665, ans=0.125 2023-11-27 01:48:20,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2023-11-27 01:48:20,639 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9550, loss[loss=0.06574, simple_loss=0.08872, pruned_loss=0.01155, audio_tagging_loss=0.009828, over 15736.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08904, pruned_loss=0.01188, audio_tagging_loss=0.008871, over 3046615.16 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:48:32,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3670920.0, ans=0.2 2023-11-27 01:48:43,569 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550650 2023-11-27 01:48:54,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3671053.3333333335, ans=0.1 2023-11-27 01:48:55,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2023-11-27 01:49:11,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-11-27 01:49:15,920 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9600, loss[loss=0.0506, simple_loss=0.0679, pruned_loss=0.007754, audio_tagging_loss=0.0089, over 13710.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08956, pruned_loss=0.01193, audio_tagging_loss=0.008925, over 3054939.84 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:49:37,136 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.787e+01 9.468e+01 1.030e+02 1.227e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 01:49:39,340 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550700 2023-11-27 01:49:39,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3671320.0, ans=0.125 2023-11-27 01:49:42,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-27 01:49:45,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3671320.0, ans=0.125 2023-11-27 01:50:12,760 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9650, loss[loss=0.06101, simple_loss=0.07973, pruned_loss=0.01116, audio_tagging_loss=0.009986, over 15474.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08956, pruned_loss=0.01192, audio_tagging_loss=0.008878, over 3050759.85 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:50:18,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3671520.0, ans=0.0 2023-11-27 01:50:31,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3671586.6666666665, ans=0.0 2023-11-27 01:50:32,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3671586.6666666665, ans=0.0 2023-11-27 01:50:34,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550750 2023-11-27 01:50:50,934 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2023-11-27 01:50:57,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3671786.6666666665, ans=0.125 2023-11-27 01:51:08,161 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9700, loss[loss=0.06552, simple_loss=0.0942, pruned_loss=0.01106, audio_tagging_loss=0.007363, over 15355.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09037, pruned_loss=0.0121, audio_tagging_loss=0.00871, over 3045241.64 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:51:09,450 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:51:14,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3671853.3333333335, ans=0.1 2023-11-27 01:51:28,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.991e+01 9.696e+01 1.056e+02 1.366e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 01:51:29,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-27 01:51:30,120 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550800 2023-11-27 01:51:38,313 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2023-11-27 01:51:42,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3672053.3333333335, ans=0.125 2023-11-27 01:51:48,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3672053.3333333335, ans=0.07 2023-11-27 01:52:03,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3672186.6666666665, ans=0.5 2023-11-27 01:52:03,768 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9750, loss[loss=0.06038, simple_loss=0.07869, pruned_loss=0.008923, audio_tagging_loss=0.01212, over 14515.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08938, pruned_loss=0.01207, audio_tagging_loss=0.008666, over 3038390.14 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:52:14,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3672253.3333333335, ans=0.125 2023-11-27 01:52:17,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3672253.3333333335, ans=0.0 2023-11-27 01:52:27,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550850 2023-11-27 01:52:30,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3672320.0, ans=0.0 2023-11-27 01:52:41,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3672386.6666666665, ans=0.125 2023-11-27 01:52:59,830 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9800, loss[loss=0.07574, simple_loss=0.1015, pruned_loss=0.01692, audio_tagging_loss=0.00805, over 15537.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09061, pruned_loss=0.01227, audio_tagging_loss=0.008543, over 3041195.57 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:53:14,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.88 vs. limit=22.5 2023-11-27 01:53:15,848 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:53:20,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.968e+01 9.826e+01 1.047e+02 1.265e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-27 01:53:22,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550900 2023-11-27 01:53:22,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3672653.3333333335, ans=0.125 2023-11-27 01:53:41,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3672720.0, ans=0.5 2023-11-27 01:53:42,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3672720.0, ans=0.1 2023-11-27 01:53:47,924 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:53:48,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-27 01:53:54,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2023-11-27 01:53:55,695 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9850, loss[loss=0.06551, simple_loss=0.09359, pruned_loss=0.008983, audio_tagging_loss=0.009734, over 14550.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09014, pruned_loss=0.01223, audio_tagging_loss=0.008549, over 3044047.93 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:54:16,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3672986.6666666665, ans=0.1 2023-11-27 01:54:17,492 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550950 2023-11-27 01:54:21,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3672986.6666666665, ans=0.5 2023-11-27 01:54:28,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-27 01:54:29,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=15.0 2023-11-27 01:54:41,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3673120.0, ans=0.0 2023-11-27 01:54:49,087 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-27 01:54:50,680 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9900, loss[loss=0.0566, simple_loss=0.06605, pruned_loss=0.01149, audio_tagging_loss=0.01208, over 15982.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08989, pruned_loss=0.01214, audio_tagging_loss=0.008509, over 3044951.37 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:54:54,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3673186.6666666665, ans=0.05 2023-11-27 01:55:05,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3673253.3333333335, ans=0.0 2023-11-27 01:55:13,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.996e+01 9.617e+01 1.030e+02 1.836e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 01:55:13,727 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551000 2023-11-27 01:55:18,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3673320.0, ans=0.0 2023-11-27 01:55:26,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2023-11-27 01:55:41,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=22.5 2023-11-27 01:55:47,257 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9950, loss[loss=0.06157, simple_loss=0.08686, pruned_loss=0.009809, audio_tagging_loss=0.008331, over 14837.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08989, pruned_loss=0.01227, audio_tagging_loss=0.008523, over 3045783.13 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:56:09,657 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551050 2023-11-27 01:56:10,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2023-11-27 01:56:29,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3673720.0, ans=0.2 2023-11-27 01:56:32,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3673786.6666666665, ans=0.0 2023-11-27 01:56:42,969 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10000, loss[loss=0.06527, simple_loss=0.0832, pruned_loss=0.01483, audio_tagging_loss=0.008843, over 15984.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09007, pruned_loss=0.01231, audio_tagging_loss=0.008449, over 3046876.10 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:56:43,584 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-27 01:56:54,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3673920.0, ans=0.0 2023-11-27 01:57:02,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3673920.0, ans=0.125 2023-11-27 01:57:05,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.059e+01 8.920e+01 9.463e+01 1.026e+02 1.255e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 01:57:05,544 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551100 2023-11-27 01:57:05,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3673986.6666666665, ans=0.125 2023-11-27 01:57:10,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3673986.6666666665, ans=0.125 2023-11-27 01:57:14,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3673986.6666666665, ans=0.125 2023-11-27 01:57:16,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-11-27 01:57:19,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3674053.3333333335, ans=0.0 2023-11-27 01:57:29,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3674120.0, ans=0.0 2023-11-27 01:57:29,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3674120.0, ans=0.0 2023-11-27 01:57:38,693 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10050, loss[loss=0.05601, simple_loss=0.07659, pruned_loss=0.01015, audio_tagging_loss=0.007562, over 14888.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09023, pruned_loss=0.01241, audio_tagging_loss=0.008501, over 3046616.56 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:57:55,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3674253.3333333335, ans=0.0 2023-11-27 01:58:01,642 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551150 2023-11-27 01:58:08,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3674320.0, ans=0.2 2023-11-27 01:58:20,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3674386.6666666665, ans=0.125 2023-11-27 01:58:34,211 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10100, loss[loss=0.05925, simple_loss=0.06702, pruned_loss=0.01318, audio_tagging_loss=0.01256, over 14595.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09079, pruned_loss=0.01263, audio_tagging_loss=0.008534, over 3047913.00 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:58:57,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.911e+01 9.483e+01 1.012e+02 1.276e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 01:58:57,204 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551200 2023-11-27 01:58:58,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3674653.3333333335, ans=0.125 2023-11-27 01:59:10,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3674720.0, ans=0.5 2023-11-27 01:59:18,100 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:59:23,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3674786.6666666665, ans=0.125 2023-11-27 01:59:30,785 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10150, loss[loss=0.06686, simple_loss=0.09878, pruned_loss=0.009472, audio_tagging_loss=0.008, over 15120.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09107, pruned_loss=0.01254, audio_tagging_loss=0.008627, over 3049908.15 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:59:52,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551250 2023-11-27 01:59:55,504 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:59:58,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3674986.6666666665, ans=0.125 2023-11-27 01:59:58,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3674986.6666666665, ans=0.125 2023-11-27 02:00:01,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2023-11-27 02:00:21,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3675120.0, ans=0.07 2023-11-27 02:00:26,860 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10200, loss[loss=0.05946, simple_loss=0.08594, pruned_loss=0.00738, audio_tagging_loss=0.009114, over 15574.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09014, pruned_loss=0.01224, audio_tagging_loss=0.008658, over 3048936.08 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:00:31,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3675186.6666666665, ans=0.0 2023-11-27 02:00:41,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3675253.3333333335, ans=0.125 2023-11-27 02:00:45,952 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:00:49,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.864e+01 9.560e+01 1.043e+02 1.445e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 02:00:49,187 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551300 2023-11-27 02:00:57,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3675320.0, ans=0.0 2023-11-27 02:00:57,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2023-11-27 02:01:15,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2023-11-27 02:01:19,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3675453.3333333335, ans=0.125 2023-11-27 02:01:22,518 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10250, loss[loss=0.05512, simple_loss=0.0759, pruned_loss=0.009119, audio_tagging_loss=0.008051, over 14651.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08976, pruned_loss=0.012, audio_tagging_loss=0.008664, over 3052446.12 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:01:36,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-27 02:01:39,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3675586.6666666665, ans=0.125 2023-11-27 02:01:44,912 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551350 2023-11-27 02:02:18,465 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10300, loss[loss=0.08209, simple_loss=0.1224, pruned_loss=0.01413, audio_tagging_loss=0.006759, over 16286.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08962, pruned_loss=0.01201, audio_tagging_loss=0.008667, over 3048970.08 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:02:19,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3675853.3333333335, ans=0.125 2023-11-27 02:02:20,235 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2023-11-27 02:02:40,223 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.971e+01 9.641e+01 1.026e+02 1.769e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 02:02:40,328 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551400 2023-11-27 02:03:00,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=22.5 2023-11-27 02:03:13,882 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10350, loss[loss=0.0643, simple_loss=0.08063, pruned_loss=0.01131, audio_tagging_loss=0.01268, over 13772.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08998, pruned_loss=0.01204, audio_tagging_loss=0.008754, over 3045109.58 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:03:14,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3676186.6666666665, ans=0.035 2023-11-27 02:03:24,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-27 02:03:28,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3676253.3333333335, ans=0.0 2023-11-27 02:03:29,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3676253.3333333335, ans=0.125 2023-11-27 02:03:36,795 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551450 2023-11-27 02:03:42,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3676320.0, ans=0.0 2023-11-27 02:04:05,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3676453.3333333335, ans=0.07 2023-11-27 02:04:09,399 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10400, loss[loss=0.08048, simple_loss=0.1084, pruned_loss=0.01801, audio_tagging_loss=0.008259, over 14384.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08937, pruned_loss=0.012, audio_tagging_loss=0.008903, over 3042351.27 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:04:12,412 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-11-27 02:04:28,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3676586.6666666665, ans=0.0 2023-11-27 02:04:32,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.109e+01 9.691e+01 1.057e+02 2.130e+02, threshold=1.938e+02, percent-clipped=1.0 2023-11-27 02:04:32,126 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551500 2023-11-27 02:04:37,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3676653.3333333335, ans=0.0 2023-11-27 02:04:43,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3676720.0, ans=0.125 2023-11-27 02:04:45,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3676720.0, ans=0.125 2023-11-27 02:05:05,172 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10450, loss[loss=0.06983, simple_loss=0.1006, pruned_loss=0.01203, audio_tagging_loss=0.007498, over 15541.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08896, pruned_loss=0.01175, audio_tagging_loss=0.008877, over 3042702.70 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:05:19,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3676920.0, ans=15.0 2023-11-27 02:05:23,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-11-27 02:05:23,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3676920.0, ans=0.125 2023-11-27 02:05:26,709 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551550 2023-11-27 02:05:37,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3677053.3333333335, ans=0.0 2023-11-27 02:05:46,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3677053.3333333335, ans=0.125 2023-11-27 02:06:00,595 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10500, loss[loss=0.07197, simple_loss=0.1033, pruned_loss=0.01299, audio_tagging_loss=0.007322, over 14439.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08909, pruned_loss=0.01196, audio_tagging_loss=0.008759, over 3042657.84 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:06:10,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3677253.3333333335, ans=0.125 2023-11-27 02:06:22,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 9.041e+01 9.594e+01 1.033e+02 2.053e+02, threshold=1.919e+02, percent-clipped=1.0 2023-11-27 02:06:23,069 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551600 2023-11-27 02:06:29,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=10.0 2023-11-27 02:06:38,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3677386.6666666665, ans=0.125 2023-11-27 02:06:46,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=15.0 2023-11-27 02:06:48,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3677453.3333333335, ans=0.0 2023-11-27 02:06:55,999 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10550, loss[loss=0.04755, simple_loss=0.06424, pruned_loss=0.006643, audio_tagging_loss=0.008789, over 14863.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08793, pruned_loss=0.01168, audio_tagging_loss=0.008701, over 3038167.42 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:07:07,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3677586.6666666665, ans=10.0 2023-11-27 02:07:09,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3677586.6666666665, ans=0.0 2023-11-27 02:07:19,455 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551650 2023-11-27 02:07:32,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3677720.0, ans=0.125 2023-11-27 02:07:50,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=22.5 2023-11-27 02:07:52,991 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10600, loss[loss=0.06595, simple_loss=0.08175, pruned_loss=0.01385, audio_tagging_loss=0.01123, over 16376.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08854, pruned_loss=0.0117, audio_tagging_loss=0.008588, over 3038475.73 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:07:57,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3677853.3333333335, ans=0.0 2023-11-27 02:07:59,108 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:08:14,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.957e+01 9.483e+01 1.042e+02 1.260e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 02:08:14,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551700 2023-11-27 02:08:19,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3677986.6666666665, ans=0.1 2023-11-27 02:08:21,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3677986.6666666665, ans=0.04949747468305833 2023-11-27 02:08:40,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3678120.0, ans=0.07 2023-11-27 02:08:48,502 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10650, loss[loss=0.03573, simple_loss=0.04551, pruned_loss=0.002619, audio_tagging_loss=0.01036, over 15149.00 frames. ], tot_loss[loss=0.06387, simple_loss=0.08756, pruned_loss=0.01151, audio_tagging_loss=0.008582, over 3036863.06 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:08:57,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3678186.6666666665, ans=0.2 2023-11-27 02:09:00,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3678253.3333333335, ans=0.125 2023-11-27 02:09:10,285 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551750 2023-11-27 02:09:17,927 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.99 vs. limit=10.0 2023-11-27 02:09:26,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-27 02:09:33,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2023-11-27 02:09:38,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3678453.3333333335, ans=0.125 2023-11-27 02:09:42,996 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10700, loss[loss=0.06097, simple_loss=0.07629, pruned_loss=0.01198, audio_tagging_loss=0.01084, over 14750.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08836, pruned_loss=0.01161, audio_tagging_loss=0.008528, over 3036455.61 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:09:53,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3678586.6666666665, ans=0.125 2023-11-27 02:10:06,393 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551800 2023-11-27 02:10:07,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 8.981e+01 9.456e+01 1.028e+02 1.264e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 02:10:12,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2023-11-27 02:10:37,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3678786.6666666665, ans=0.1 2023-11-27 02:10:40,254 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10750, loss[loss=0.09081, simple_loss=0.123, pruned_loss=0.02165, audio_tagging_loss=0.007645, over 15112.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08874, pruned_loss=0.01176, audio_tagging_loss=0.008496, over 3047527.33 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:10:46,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3678853.3333333335, ans=10.0 2023-11-27 02:10:47,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3678853.3333333335, ans=0.125 2023-11-27 02:11:00,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3678920.0, ans=0.2 2023-11-27 02:11:01,913 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551850 2023-11-27 02:11:02,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3678986.6666666665, ans=0.125 2023-11-27 02:11:07,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3678986.6666666665, ans=0.125 2023-11-27 02:11:10,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3678986.6666666665, ans=0.0 2023-11-27 02:11:35,295 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10800, loss[loss=0.07502, simple_loss=0.1067, pruned_loss=0.01428, audio_tagging_loss=0.007374, over 16322.00 frames. ], tot_loss[loss=0.06383, simple_loss=0.0877, pruned_loss=0.01148, audio_tagging_loss=0.00849, over 3056657.76 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:11:37,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3679186.6666666665, ans=0.125 2023-11-27 02:11:57,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551900 2023-11-27 02:11:59,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.776e+01 9.602e+01 1.034e+02 1.420e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 02:12:03,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3679320.0, ans=0.95 2023-11-27 02:12:05,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.26 vs. limit=10.0 2023-11-27 02:12:05,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3679320.0, ans=0.1 2023-11-27 02:12:26,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3679453.3333333335, ans=0.125 2023-11-27 02:12:28,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3679453.3333333335, ans=0.125 2023-11-27 02:12:30,731 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10850, loss[loss=0.05633, simple_loss=0.07726, pruned_loss=0.008527, audio_tagging_loss=0.009167, over 15787.00 frames. ], tot_loss[loss=0.06372, simple_loss=0.08739, pruned_loss=0.01149, audio_tagging_loss=0.008541, over 3053697.89 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:12:45,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3679586.6666666665, ans=0.0 2023-11-27 02:12:51,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2023-11-27 02:12:54,113 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551950 2023-11-27 02:13:01,945 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-11-27 02:13:03,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3679720.0, ans=0.125 2023-11-27 02:13:20,986 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:13:21,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3679786.6666666665, ans=0.125 2023-11-27 02:13:26,284 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10900, loss[loss=0.06756, simple_loss=0.09331, pruned_loss=0.01313, audio_tagging_loss=0.007769, over 14839.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08853, pruned_loss=0.01187, audio_tagging_loss=0.00853, over 3052774.20 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:13:42,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3679920.0, ans=0.125 2023-11-27 02:13:44,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3679920.0, ans=0.2 2023-11-27 02:13:49,341 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552000 2023-11-27 02:13:53,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.930e+01 9.586e+01 1.062e+02 1.591e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 02:13:53,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3679986.6666666665, ans=0.05 2023-11-27 02:13:56,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3679986.6666666665, ans=0.015 2023-11-27 02:13:56,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3679986.6666666665, ans=0.035 2023-11-27 02:14:04,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3680053.3333333335, ans=10.0 2023-11-27 02:14:14,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3680120.0, ans=0.125 2023-11-27 02:14:25,458 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10950, loss[loss=0.07048, simple_loss=0.09154, pruned_loss=0.0142, audio_tagging_loss=0.0105, over 14706.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08863, pruned_loss=0.01183, audio_tagging_loss=0.008636, over 3048041.61 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:14:35,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3680253.3333333335, ans=0.09899494936611666 2023-11-27 02:14:46,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552050 2023-11-27 02:14:46,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:14:46,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3680320.0, ans=0.1 2023-11-27 02:14:46,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:14:52,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:14:59,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-27 02:15:00,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.80 vs. limit=10.0 2023-11-27 02:15:08,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3680386.6666666665, ans=0.2 2023-11-27 02:15:10,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3680453.3333333335, ans=10.0 2023-11-27 02:15:15,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3680453.3333333335, ans=0.0 2023-11-27 02:15:20,767 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11000, loss[loss=0.07828, simple_loss=0.1195, pruned_loss=0.01319, audio_tagging_loss=0.005338, over 15959.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08879, pruned_loss=0.01177, audio_tagging_loss=0.008668, over 3050593.87 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:15:27,160 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:15:27,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3680520.0, ans=0.125 2023-11-27 02:15:35,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3680586.6666666665, ans=0.125 2023-11-27 02:15:40,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3680586.6666666665, ans=0.125 2023-11-27 02:15:43,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552100 2023-11-27 02:15:46,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.927e+01 9.605e+01 1.045e+02 1.330e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 02:15:46,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3680653.3333333335, ans=0.07 2023-11-27 02:16:03,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.78 vs. limit=10.0 2023-11-27 02:16:08,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3680786.6666666665, ans=0.125 2023-11-27 02:16:11,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3680786.6666666665, ans=0.2 2023-11-27 02:16:16,360 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11050, loss[loss=0.08531, simple_loss=0.1222, pruned_loss=0.01884, audio_tagging_loss=0.005383, over 15005.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08807, pruned_loss=0.01166, audio_tagging_loss=0.008736, over 3052714.40 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:16:28,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3680920.0, ans=0.05 2023-11-27 02:16:33,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2023-11-27 02:16:39,087 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552150 2023-11-27 02:16:41,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3680986.6666666665, ans=0.09899494936611666 2023-11-27 02:16:45,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3680986.6666666665, ans=0.1 2023-11-27 02:16:45,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3680986.6666666665, ans=0.125 2023-11-27 02:16:46,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3680986.6666666665, ans=0.0 2023-11-27 02:17:13,201 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11100, loss[loss=0.07017, simple_loss=0.0938, pruned_loss=0.01112, audio_tagging_loss=0.01215, over 15588.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08916, pruned_loss=0.01196, audio_tagging_loss=0.008761, over 3052860.02 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:17:17,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3681186.6666666665, ans=0.0 2023-11-27 02:17:33,589 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:17:34,489 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552200 2023-11-27 02:17:34,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3681320.0, ans=0.025 2023-11-27 02:17:36,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.882e+01 9.437e+01 1.044e+02 2.360e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 02:17:46,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3681386.6666666665, ans=0.0 2023-11-27 02:17:47,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3681386.6666666665, ans=0.125 2023-11-27 02:17:55,786 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=12.0 2023-11-27 02:18:08,110 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11150, loss[loss=0.07601, simple_loss=0.1155, pruned_loss=0.01313, audio_tagging_loss=0.00514, over 15479.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08879, pruned_loss=0.01194, audio_tagging_loss=0.008881, over 3049935.25 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:18:13,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3681520.0, ans=0.125 2023-11-27 02:18:21,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3681586.6666666665, ans=0.0 2023-11-27 02:18:27,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3681586.6666666665, ans=0.125 2023-11-27 02:18:29,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3681653.3333333335, ans=0.1 2023-11-27 02:18:30,431 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552250 2023-11-27 02:18:42,645 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2023-11-27 02:18:45,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3681720.0, ans=0.0 2023-11-27 02:18:59,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3681786.6666666665, ans=0.95 2023-11-27 02:19:03,611 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11200, loss[loss=0.06713, simple_loss=0.09425, pruned_loss=0.01171, audio_tagging_loss=0.008286, over 14643.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08896, pruned_loss=0.01199, audio_tagging_loss=0.008983, over 3049740.48 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:19:17,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3681920.0, ans=0.125 2023-11-27 02:19:18,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3681920.0, ans=0.125 2023-11-27 02:19:26,593 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552300 2023-11-27 02:19:28,646 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.953e+01 9.456e+01 1.019e+02 1.233e+02, threshold=1.891e+02, percent-clipped=1.0 2023-11-27 02:19:29,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2023-11-27 02:19:59,859 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11250, loss[loss=0.05076, simple_loss=0.07075, pruned_loss=0.009456, audio_tagging_loss=0.00593, over 15741.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08801, pruned_loss=0.01168, audio_tagging_loss=0.008966, over 3047970.75 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:20:21,698 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552350 2023-11-27 02:20:26,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3682320.0, ans=0.125 2023-11-27 02:20:26,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3682320.0, ans=0.0 2023-11-27 02:20:27,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-11-27 02:20:52,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3682453.3333333335, ans=0.125 2023-11-27 02:20:55,476 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11300, loss[loss=0.06103, simple_loss=0.08987, pruned_loss=0.009695, audio_tagging_loss=0.006403, over 14034.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08801, pruned_loss=0.01166, audio_tagging_loss=0.008758, over 3048785.48 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:20:57,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3682520.0, ans=0.1 2023-11-27 02:21:07,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3682586.6666666665, ans=0.125 2023-11-27 02:21:12,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3682586.6666666665, ans=0.125 2023-11-27 02:21:17,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552400 2023-11-27 02:21:19,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2023-11-27 02:21:20,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3682653.3333333335, ans=0.0 2023-11-27 02:21:21,105 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 9.061e+01 9.736e+01 1.047e+02 2.003e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-27 02:21:22,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2023-11-27 02:21:23,752 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=22.5 2023-11-27 02:21:31,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3682720.0, ans=0.125 2023-11-27 02:21:47,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3682786.6666666665, ans=0.125 2023-11-27 02:21:50,790 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11350, loss[loss=0.0586, simple_loss=0.08716, pruned_loss=0.009357, audio_tagging_loss=0.005666, over 15623.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08841, pruned_loss=0.01177, audio_tagging_loss=0.008615, over 3045537.45 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:21:57,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3682853.3333333335, ans=0.2 2023-11-27 02:22:01,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3682920.0, ans=0.0 2023-11-27 02:22:13,332 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552450 2023-11-27 02:22:42,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3683120.0, ans=0.2 2023-11-27 02:22:46,350 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11400, loss[loss=0.05073, simple_loss=0.06429, pruned_loss=0.007503, audio_tagging_loss=0.01109, over 14267.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08892, pruned_loss=0.01185, audio_tagging_loss=0.008505, over 3046839.10 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:23:02,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-27 02:23:04,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3683253.3333333335, ans=0.125 2023-11-27 02:23:06,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3683253.3333333335, ans=0.1 2023-11-27 02:23:08,757 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552500 2023-11-27 02:23:11,780 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 9.008e+01 9.574e+01 1.020e+02 1.271e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 02:23:24,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3683386.6666666665, ans=0.05 2023-11-27 02:23:40,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3683520.0, ans=0.2 2023-11-27 02:23:41,787 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11450, loss[loss=0.05931, simple_loss=0.08051, pruned_loss=0.01122, audio_tagging_loss=0.007845, over 14697.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08944, pruned_loss=0.01199, audio_tagging_loss=0.008461, over 3049390.41 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:23:44,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2023-11-27 02:23:56,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3683586.6666666665, ans=0.2 2023-11-27 02:24:03,996 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552550 2023-11-27 02:24:10,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3683653.3333333335, ans=0.125 2023-11-27 02:24:20,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3683720.0, ans=0.0 2023-11-27 02:24:34,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=12.0 2023-11-27 02:24:37,127 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11500, loss[loss=0.04742, simple_loss=0.0625, pruned_loss=0.005833, audio_tagging_loss=0.01034, over 15497.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08932, pruned_loss=0.01197, audio_tagging_loss=0.008502, over 3048153.15 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:24:38,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3683853.3333333335, ans=0.0 2023-11-27 02:24:44,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3683853.3333333335, ans=0.125 2023-11-27 02:24:49,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3683920.0, ans=0.2 2023-11-27 02:24:59,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552600 2023-11-27 02:25:03,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.865e+01 9.337e+01 9.934e+01 1.227e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 02:25:14,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3684053.3333333335, ans=0.125 2023-11-27 02:25:33,360 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11550, loss[loss=0.05702, simple_loss=0.07921, pruned_loss=0.006385, audio_tagging_loss=0.01103, over 15848.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08956, pruned_loss=0.01201, audio_tagging_loss=0.008546, over 3046371.11 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:25:54,553 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2023-11-27 02:25:55,498 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552650 2023-11-27 02:26:04,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3684320.0, ans=0.1 2023-11-27 02:26:05,624 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:26:28,796 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11600, loss[loss=0.06031, simple_loss=0.08683, pruned_loss=0.009518, audio_tagging_loss=0.007376, over 14891.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08935, pruned_loss=0.01194, audio_tagging_loss=0.008587, over 3040361.34 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:26:43,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3684586.6666666665, ans=0.125 2023-11-27 02:26:43,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3684586.6666666665, ans=0.0 2023-11-27 02:26:46,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3684586.6666666665, ans=0.125 2023-11-27 02:26:50,921 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552700 2023-11-27 02:26:53,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.971e+01 9.757e+01 1.054e+02 1.398e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 02:26:56,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3684653.3333333335, ans=0.1 2023-11-27 02:27:06,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3684720.0, ans=0.125 2023-11-27 02:27:13,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3684786.6666666665, ans=0.5 2023-11-27 02:27:21,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3684786.6666666665, ans=0.125 2023-11-27 02:27:24,034 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11650, loss[loss=0.08394, simple_loss=0.1216, pruned_loss=0.01718, audio_tagging_loss=0.005943, over 14965.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09004, pruned_loss=0.01209, audio_tagging_loss=0.00851, over 3037747.77 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:27:46,617 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552750 2023-11-27 02:27:49,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3684986.6666666665, ans=0.0 2023-11-27 02:28:05,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-27 02:28:13,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3685120.0, ans=0.1 2023-11-27 02:28:14,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3685120.0, ans=0.04949747468305833 2023-11-27 02:28:19,277 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11700, loss[loss=0.08816, simple_loss=0.1158, pruned_loss=0.02122, audio_tagging_loss=0.009053, over 14346.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08927, pruned_loss=0.01215, audio_tagging_loss=0.008628, over 3034355.42 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:28:24,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3685186.6666666665, ans=0.125 2023-11-27 02:28:39,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3685253.3333333335, ans=0.125 2023-11-27 02:28:41,501 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552800 2023-11-27 02:28:44,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.894e+01 9.455e+01 1.028e+02 1.281e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 02:28:47,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3685320.0, ans=0.125 2023-11-27 02:28:52,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3685386.6666666665, ans=0.0 2023-11-27 02:28:55,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3685386.6666666665, ans=0.5 2023-11-27 02:29:03,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3685453.3333333335, ans=0.1 2023-11-27 02:29:04,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2023-11-27 02:29:12,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-11-27 02:29:15,649 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11750, loss[loss=0.04816, simple_loss=0.06624, pruned_loss=0.005611, audio_tagging_loss=0.009432, over 15276.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08897, pruned_loss=0.01217, audio_tagging_loss=0.008643, over 3035547.64 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:29:26,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3685586.6666666665, ans=0.2 2023-11-27 02:29:32,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3685586.6666666665, ans=0.1 2023-11-27 02:29:37,999 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552850 2023-11-27 02:29:38,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-27 02:29:42,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3685653.3333333335, ans=0.0 2023-11-27 02:29:52,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3685720.0, ans=0.125 2023-11-27 02:29:58,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3685720.0, ans=0.125 2023-11-27 02:30:02,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3685786.6666666665, ans=0.125 2023-11-27 02:30:10,481 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11800, loss[loss=0.06571, simple_loss=0.08577, pruned_loss=0.01209, audio_tagging_loss=0.01074, over 15049.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08898, pruned_loss=0.01232, audio_tagging_loss=0.008679, over 3036166.22 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:30:11,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3685853.3333333335, ans=0.0 2023-11-27 02:30:12,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=12.0 2023-11-27 02:30:13,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3685853.3333333335, ans=0.0 2023-11-27 02:30:28,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3685920.0, ans=0.125 2023-11-27 02:30:30,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3685920.0, ans=0.0 2023-11-27 02:30:32,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3685986.6666666665, ans=0.125 2023-11-27 02:30:33,998 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552900 2023-11-27 02:30:36,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3685986.6666666665, ans=0.0 2023-11-27 02:30:37,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.095e+01 8.976e+01 9.582e+01 1.019e+02 1.579e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 02:30:41,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3685986.6666666665, ans=0.125 2023-11-27 02:30:45,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3686053.3333333335, ans=0.05 2023-11-27 02:30:47,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3686053.3333333335, ans=0.1 2023-11-27 02:30:56,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3686120.0, ans=0.0 2023-11-27 02:30:59,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3686120.0, ans=0.125 2023-11-27 02:31:06,366 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11850, loss[loss=0.07255, simple_loss=0.08875, pruned_loss=0.01808, audio_tagging_loss=0.0101, over 13616.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08921, pruned_loss=0.01239, audio_tagging_loss=0.008681, over 3036605.29 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:31:08,119 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.69 vs. limit=12.0 2023-11-27 02:31:14,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3686186.6666666665, ans=0.125 2023-11-27 02:31:25,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686253.3333333335, ans=0.1 2023-11-27 02:31:28,532 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552950 2023-11-27 02:31:31,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3686320.0, ans=0.0 2023-11-27 02:31:40,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3686386.6666666665, ans=0.125 2023-11-27 02:31:51,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-27 02:32:02,453 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11900, loss[loss=0.06194, simple_loss=0.08606, pruned_loss=0.009227, audio_tagging_loss=0.009681, over 16157.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08917, pruned_loss=0.01218, audio_tagging_loss=0.008764, over 3037880.79 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:32:19,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3686586.6666666665, ans=0.2 2023-11-27 02:32:23,629 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553000 2023-11-27 02:32:27,253 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.694e+01 9.517e+01 1.011e+02 1.462e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 02:32:31,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3686653.3333333335, ans=0.5 2023-11-27 02:32:33,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-27 02:32:45,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3686786.6666666665, ans=0.125 2023-11-27 02:32:45,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3686786.6666666665, ans=0.0 2023-11-27 02:32:57,305 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11950, loss[loss=0.05133, simple_loss=0.06692, pruned_loss=0.007483, audio_tagging_loss=0.01039, over 15947.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08907, pruned_loss=0.01211, audio_tagging_loss=0.008883, over 3048129.73 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:32:59,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3686853.3333333335, ans=0.0 2023-11-27 02:33:20,256 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553050 2023-11-27 02:33:29,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3686986.6666666665, ans=0.1 2023-11-27 02:33:35,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-11-27 02:33:40,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3687120.0, ans=0.0 2023-11-27 02:33:51,402 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 12000, loss[loss=0.0664, simple_loss=0.09092, pruned_loss=0.01117, audio_tagging_loss=0.009775, over 16554.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08916, pruned_loss=0.01214, audio_tagging_loss=0.008963, over 3049502.68 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:33:51,403 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 02:34:23,576 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05804, simple_loss=0.0505, pruned_loss=0.005297, audio_tagging_loss=0.02749, over 4681554.00 frames. 2023-11-27 02:34:23,577 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 02:34:25,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3687186.6666666665, ans=0.125 2023-11-27 02:34:28,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3687186.6666666665, ans=0.125 2023-11-27 02:34:40,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3687253.3333333335, ans=0.0 2023-11-27 02:34:44,531 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553100 2023-11-27 02:35:18,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.955e+01 9.759e+01 1.053e+02 1.237e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 02:35:18,868 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 0, loss[loss=0.06593, simple_loss=0.07905, pruned_loss=0.007195, audio_tagging_loss=0.01921, over 14429.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.07905, pruned_loss=0.007195, audio_tagging_loss=0.01921, over 14429.00 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:35:18,869 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 02:35:50,398 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05785, simple_loss=0.05054, pruned_loss=0.005317, audio_tagging_loss=0.02726, over 4681554.00 frames. 2023-11-27 02:35:50,398 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 02:35:58,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3687340.0, ans=0.1 2023-11-27 02:36:34,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3687606.6666666665, ans=0.125 2023-11-27 02:36:42,314 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553150 2023-11-27 02:36:45,421 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 50, loss[loss=0.08406, simple_loss=0.106, pruned_loss=0.01166, audio_tagging_loss=0.01942, over 15110.00 frames. ], tot_loss[loss=0.07415, simple_loss=0.08973, pruned_loss=0.01215, audio_tagging_loss=0.01714, over 684853.72 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:36:49,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3687673.3333333335, ans=0.125 2023-11-27 02:36:53,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3687673.3333333335, ans=0.0 2023-11-27 02:37:04,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3687740.0, ans=10.0 2023-11-27 02:37:07,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3687806.6666666665, ans=0.125 2023-11-27 02:37:08,610 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:37:08,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3687806.6666666665, ans=0.125 2023-11-27 02:37:13,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3687806.6666666665, ans=0.125 2023-11-27 02:37:37,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553200 2023-11-27 02:37:41,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.064e+01 9.815e+01 1.050e+02 1.145e+02 1.417e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-27 02:37:41,675 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 100, loss[loss=0.07924, simple_loss=0.1018, pruned_loss=0.01398, audio_tagging_loss=0.01436, over 16792.00 frames. ], tot_loss[loss=0.07278, simple_loss=0.08913, pruned_loss=0.01188, audio_tagging_loss=0.01633, over 1209713.71 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:37:42,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=12.0 2023-11-27 02:37:54,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3688073.3333333335, ans=0.2 2023-11-27 02:38:05,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3688140.0, ans=0.2 2023-11-27 02:38:12,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3688140.0, ans=0.2 2023-11-27 02:38:22,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-27 02:38:22,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-27 02:38:32,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3688273.3333333335, ans=0.1 2023-11-27 02:38:34,162 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553250 2023-11-27 02:38:37,320 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 150, loss[loss=0.06891, simple_loss=0.09513, pruned_loss=0.01073, audio_tagging_loss=0.01061, over 15641.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.08727, pruned_loss=0.01162, audio_tagging_loss=0.01471, over 1618020.20 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:38:45,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3688340.0, ans=0.0 2023-11-27 02:38:53,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3688406.6666666665, ans=0.2 2023-11-27 02:38:57,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3688473.3333333335, ans=0.125 2023-11-27 02:39:29,315 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553300 2023-11-27 02:39:32,408 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 200, loss[loss=0.07216, simple_loss=0.09984, pruned_loss=0.01148, audio_tagging_loss=0.01076, over 14690.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.08823, pruned_loss=0.01185, audio_tagging_loss=0.01293, over 1925197.61 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:39:32,606 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:39:34,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.211e+01 9.198e+01 9.713e+01 1.048e+02 1.227e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 02:40:09,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3688873.3333333335, ans=0.125 2023-11-27 02:40:15,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3688940.0, ans=0.0 2023-11-27 02:40:17,302 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2023-11-27 02:40:19,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688940.0, ans=0.1 2023-11-27 02:40:24,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553350 2023-11-27 02:40:27,872 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 250, loss[loss=0.07774, simple_loss=0.116, pruned_loss=0.01214, audio_tagging_loss=0.007607, over 15605.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.08817, pruned_loss=0.01189, audio_tagging_loss=0.0116, over 2170122.72 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:40:33,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3689006.6666666665, ans=0.1 2023-11-27 02:40:44,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3689073.3333333335, ans=0.025 2023-11-27 02:40:44,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3689073.3333333335, ans=0.1 2023-11-27 02:40:51,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3689140.0, ans=0.1 2023-11-27 02:41:09,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3689206.6666666665, ans=0.0 2023-11-27 02:41:16,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3689273.3333333335, ans=0.2 2023-11-27 02:41:21,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553400 2023-11-27 02:41:24,708 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 300, loss[loss=0.06632, simple_loss=0.08106, pruned_loss=0.01605, audio_tagging_loss=0.009734, over 13528.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.08867, pruned_loss=0.01198, audio_tagging_loss=0.01071, over 2367228.78 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:41:26,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.232e+01 1.015e+02 1.128e+02 1.500e+02, threshold=2.030e+02, percent-clipped=0.0 2023-11-27 02:41:28,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3689340.0, ans=0.1 2023-11-27 02:41:38,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3689406.6666666665, ans=0.125 2023-11-27 02:41:48,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3689473.3333333335, ans=0.035 2023-11-27 02:42:14,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3689606.6666666665, ans=0.0 2023-11-27 02:42:16,582 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553450 2023-11-27 02:42:19,757 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 350, loss[loss=0.0553, simple_loss=0.0785, pruned_loss=0.006747, audio_tagging_loss=0.009308, over 14200.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08886, pruned_loss=0.01191, audio_tagging_loss=0.01015, over 2517375.26 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:42:20,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3689673.3333333335, ans=0.125 2023-11-27 02:42:40,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3689740.0, ans=0.09899494936611666 2023-11-27 02:42:52,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3689873.3333333335, ans=0.125 2023-11-27 02:42:58,261 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:43:10,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3689940.0, ans=0.125 2023-11-27 02:43:11,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553500 2023-11-27 02:43:13,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=12.0 2023-11-27 02:43:15,466 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 400, loss[loss=0.05166, simple_loss=0.06743, pruned_loss=0.008925, audio_tagging_loss=0.009022, over 14066.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08839, pruned_loss=0.01188, audio_tagging_loss=0.009838, over 2636929.77 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:43:18,098 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.006e+01 8.931e+01 9.402e+01 1.042e+02 1.214e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 02:43:38,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3690140.0, ans=0.04949747468305833 2023-11-27 02:43:49,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3690206.6666666665, ans=0.125 2023-11-27 02:44:08,451 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553550 2023-11-27 02:44:11,508 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 450, loss[loss=0.05162, simple_loss=0.06937, pruned_loss=0.009436, audio_tagging_loss=0.007499, over 16806.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08793, pruned_loss=0.01188, audio_tagging_loss=0.009521, over 2736175.24 frames. ], batch size: 64, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:44:21,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3690406.6666666665, ans=0.1 2023-11-27 02:44:24,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3690406.6666666665, ans=0.125 2023-11-27 02:44:34,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3690473.3333333335, ans=0.0 2023-11-27 02:44:36,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3690473.3333333335, ans=0.0 2023-11-27 02:44:39,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3690473.3333333335, ans=0.0 2023-11-27 02:44:41,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3690473.3333333335, ans=0.1 2023-11-27 02:44:55,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3690606.6666666665, ans=0.125 2023-11-27 02:45:03,911 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553600 2023-11-27 02:45:07,280 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 500, loss[loss=0.05202, simple_loss=0.07327, pruned_loss=0.008226, audio_tagging_loss=0.007157, over 14263.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08796, pruned_loss=0.01186, audio_tagging_loss=0.009355, over 2801683.19 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:45:09,470 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.932e+01 9.491e+01 1.008e+02 1.797e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 02:45:19,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3690740.0, ans=0.125 2023-11-27 02:45:19,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.05 vs. limit=10.0 2023-11-27 02:45:24,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3690740.0, ans=0.2 2023-11-27 02:45:32,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3690806.6666666665, ans=0.0 2023-11-27 02:45:48,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3690873.3333333335, ans=0.125 2023-11-27 02:45:48,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3690873.3333333335, ans=0.125 2023-11-27 02:45:59,600 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553650 2023-11-27 02:46:02,702 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 550, loss[loss=0.06339, simple_loss=0.0874, pruned_loss=0.01172, audio_tagging_loss=0.007969, over 15311.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08817, pruned_loss=0.01199, audio_tagging_loss=0.009228, over 2856680.33 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:46:37,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.09 vs. limit=15.0 2023-11-27 02:46:39,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3691206.6666666665, ans=0.125 2023-11-27 02:46:41,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-11-27 02:46:44,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2023-11-27 02:46:48,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3691273.3333333335, ans=0.2 2023-11-27 02:46:55,927 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553700 2023-11-27 02:46:56,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-11-27 02:46:59,535 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 600, loss[loss=0.04544, simple_loss=0.0582, pruned_loss=0.006998, audio_tagging_loss=0.009337, over 14562.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08901, pruned_loss=0.01202, audio_tagging_loss=0.009112, over 2896631.95 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:47:01,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.835e+01 9.409e+01 1.013e+02 1.233e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 02:47:11,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3691406.6666666665, ans=0.125 2023-11-27 02:47:19,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3691406.6666666665, ans=0.125 2023-11-27 02:47:34,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3691540.0, ans=0.1 2023-11-27 02:47:42,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3691540.0, ans=0.125 2023-11-27 02:47:51,533 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553750 2023-11-27 02:47:55,222 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 650, loss[loss=0.08297, simple_loss=0.1065, pruned_loss=0.02141, audio_tagging_loss=0.008288, over 14291.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08873, pruned_loss=0.01206, audio_tagging_loss=0.009042, over 2932594.91 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:48:00,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3691673.3333333335, ans=0.2 2023-11-27 02:48:05,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3691740.0, ans=0.125 2023-11-27 02:48:06,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=22.5 2023-11-27 02:48:10,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3691740.0, ans=0.0 2023-11-27 02:48:17,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3691806.6666666665, ans=0.125 2023-11-27 02:48:39,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-27 02:48:44,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3691940.0, ans=0.125 2023-11-27 02:48:45,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3691940.0, ans=0.2 2023-11-27 02:48:47,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553800 2023-11-27 02:48:50,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3692006.6666666665, ans=0.0 2023-11-27 02:48:51,058 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 700, loss[loss=0.04669, simple_loss=0.05937, pruned_loss=0.008223, audio_tagging_loss=0.008785, over 16233.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08819, pruned_loss=0.01179, audio_tagging_loss=0.008953, over 2961756.14 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:48:53,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.860e+01 9.509e+01 1.038e+02 1.459e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 02:48:54,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3692006.6666666665, ans=0.025 2023-11-27 02:48:54,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3692006.6666666665, ans=0.0 2023-11-27 02:49:13,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3692140.0, ans=0.0 2023-11-27 02:49:38,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2023-11-27 02:49:44,264 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553850 2023-11-27 02:49:47,904 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 750, loss[loss=0.08692, simple_loss=0.1223, pruned_loss=0.01414, audio_tagging_loss=0.01164, over 15725.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08917, pruned_loss=0.01191, audio_tagging_loss=0.00884, over 2979754.24 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:49:52,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3692340.0, ans=0.0 2023-11-27 02:50:15,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3692473.3333333335, ans=0.125 2023-11-27 02:50:17,231 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:50:40,013 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553900 2023-11-27 02:50:43,116 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 800, loss[loss=0.06401, simple_loss=0.08193, pruned_loss=0.01381, audio_tagging_loss=0.00923, over 14248.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08893, pruned_loss=0.01191, audio_tagging_loss=0.00885, over 2984794.43 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:50:45,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.051e+01 9.571e+01 1.030e+02 1.342e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-27 02:51:07,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3692806.6666666665, ans=0.125 2023-11-27 02:51:18,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3692873.3333333335, ans=0.125 2023-11-27 02:51:19,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3692873.3333333335, ans=0.125 2023-11-27 02:51:29,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3692940.0, ans=0.125 2023-11-27 02:51:35,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553950 2023-11-27 02:51:39,019 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 850, loss[loss=0.07095, simple_loss=0.09249, pruned_loss=0.01595, audio_tagging_loss=0.008757, over 15795.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08921, pruned_loss=0.0121, audio_tagging_loss=0.008845, over 2999096.76 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:51:44,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3693006.6666666665, ans=0.125 2023-11-27 02:51:44,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3693006.6666666665, ans=0.125 2023-11-27 02:52:05,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-27 02:52:18,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3693206.6666666665, ans=0.0 2023-11-27 02:52:29,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3693273.3333333335, ans=0.2 2023-11-27 02:52:32,224 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554000 2023-11-27 02:52:33,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3693273.3333333335, ans=0.02 2023-11-27 02:52:34,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3693340.0, ans=0.125 2023-11-27 02:52:35,565 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 900, loss[loss=0.04437, simple_loss=0.05015, pruned_loss=0.006534, audio_tagging_loss=0.01276, over 14851.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08945, pruned_loss=0.01209, audio_tagging_loss=0.008894, over 3010066.57 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:52:39,245 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.856e+01 9.562e+01 1.034e+02 1.273e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 02:52:44,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-27 02:52:44,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3693340.0, ans=0.125 2023-11-27 02:52:46,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3693406.6666666665, ans=0.0 2023-11-27 02:52:47,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3693406.6666666665, ans=0.125 2023-11-27 02:52:47,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-27 02:52:51,399 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-11-27 02:52:52,224 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:53:01,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-27 02:53:18,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693540.0, ans=0.1 2023-11-27 02:53:25,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3693606.6666666665, ans=0.125 2023-11-27 02:53:28,079 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554050 2023-11-27 02:53:31,190 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 950, loss[loss=0.05481, simple_loss=0.07549, pruned_loss=0.006904, audio_tagging_loss=0.01016, over 15368.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08933, pruned_loss=0.01218, audio_tagging_loss=0.008793, over 3018478.16 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:53:48,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2023-11-27 02:53:57,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-27 02:54:09,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3693873.3333333335, ans=0.0 2023-11-27 02:54:19,299 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=15.0 2023-11-27 02:54:19,452 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=22.5 2023-11-27 02:54:23,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554100 2023-11-27 02:54:26,310 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1000, loss[loss=0.06505, simple_loss=0.08999, pruned_loss=0.01162, audio_tagging_loss=0.00843, over 15747.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08976, pruned_loss=0.01216, audio_tagging_loss=0.008642, over 3029778.27 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:54:28,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-27 02:54:29,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 9.006e+01 9.495e+01 1.025e+02 1.376e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 02:54:37,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2023-11-27 02:54:42,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3694073.3333333335, ans=0.2 2023-11-27 02:54:47,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3694073.3333333335, ans=0.125 2023-11-27 02:54:49,612 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:54:51,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3694140.0, ans=0.0 2023-11-27 02:55:04,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3694206.6666666665, ans=0.2 2023-11-27 02:55:10,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.11 vs. limit=10.0 2023-11-27 02:55:19,482 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554150 2023-11-27 02:55:23,090 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1050, loss[loss=0.08379, simple_loss=0.09952, pruned_loss=0.02056, audio_tagging_loss=0.01347, over 15502.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08916, pruned_loss=0.012, audio_tagging_loss=0.008593, over 3035186.74 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:55:26,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3694340.0, ans=0.125 2023-11-27 02:55:26,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2023-11-27 02:55:52,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3694473.3333333335, ans=0.125 2023-11-27 02:56:08,087 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2023-11-27 02:56:13,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3694606.6666666665, ans=0.125 2023-11-27 02:56:14,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2023-11-27 02:56:15,437 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554200 2023-11-27 02:56:18,832 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1100, loss[loss=0.06755, simple_loss=0.09004, pruned_loss=0.01351, audio_tagging_loss=0.009023, over 15206.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08818, pruned_loss=0.01204, audio_tagging_loss=0.00864, over 3039974.51 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:56:19,890 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:56:21,956 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.965e+01 9.717e+01 1.039e+02 1.284e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 02:56:23,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3694673.3333333335, ans=0.0 2023-11-27 02:56:38,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3694740.0, ans=0.0 2023-11-27 02:56:39,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3694806.6666666665, ans=0.2 2023-11-27 02:56:42,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3694806.6666666665, ans=0.125 2023-11-27 02:56:44,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-27 02:56:55,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2023-11-27 02:57:00,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3694873.3333333335, ans=0.0 2023-11-27 02:57:00,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3694873.3333333335, ans=0.0 2023-11-27 02:57:07,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3694940.0, ans=0.125 2023-11-27 02:57:10,368 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554250 2023-11-27 02:57:13,424 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1150, loss[loss=0.0588, simple_loss=0.07789, pruned_loss=0.007249, audio_tagging_loss=0.01261, over 14147.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08826, pruned_loss=0.01204, audio_tagging_loss=0.00861, over 3035329.33 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:57:30,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=22.5 2023-11-27 02:57:37,400 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.02 vs. limit=15.0 2023-11-27 02:57:40,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2023-11-27 02:57:42,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3695140.0, ans=15.0 2023-11-27 02:57:49,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3695206.6666666665, ans=0.125 2023-11-27 02:57:50,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3695206.6666666665, ans=0.125 2023-11-27 02:58:02,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3695273.3333333335, ans=0.125 2023-11-27 02:58:05,417 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554300 2023-11-27 02:58:09,071 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1200, loss[loss=0.08043, simple_loss=0.1176, pruned_loss=0.01584, audio_tagging_loss=0.005816, over 15925.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08845, pruned_loss=0.01202, audio_tagging_loss=0.008598, over 3035126.82 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:58:09,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3695340.0, ans=0.2 2023-11-27 02:58:12,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 8.950e+01 9.657e+01 1.053e+02 1.302e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 02:58:28,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2023-11-27 02:59:02,060 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554350 2023-11-27 02:59:05,177 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1250, loss[loss=0.07496, simple_loss=0.1083, pruned_loss=0.01404, audio_tagging_loss=0.006771, over 15332.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08833, pruned_loss=0.01201, audio_tagging_loss=0.008622, over 3031483.05 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:59:11,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3695673.3333333335, ans=0.0 2023-11-27 02:59:14,956 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:59:19,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3695740.0, ans=0.125 2023-11-27 02:59:20,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3695740.0, ans=0.125 2023-11-27 02:59:24,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-11-27 02:59:33,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3695806.6666666665, ans=15.0 2023-11-27 02:59:35,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3695806.6666666665, ans=0.0 2023-11-27 02:59:50,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3695940.0, ans=0.2 2023-11-27 02:59:55,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3695940.0, ans=0.1 2023-11-27 02:59:56,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695940.0, ans=0.1 2023-11-27 02:59:57,584 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554400 2023-11-27 03:00:01,009 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1300, loss[loss=0.06099, simple_loss=0.08224, pruned_loss=0.01101, audio_tagging_loss=0.008859, over 15124.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08875, pruned_loss=0.012, audio_tagging_loss=0.008628, over 3033087.21 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:00:04,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.987e+01 9.539e+01 1.033e+02 1.348e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 03:00:05,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3696006.6666666665, ans=0.125 2023-11-27 03:00:46,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3696273.3333333335, ans=0.125 2023-11-27 03:00:53,278 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554450 2023-11-27 03:00:56,943 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1350, loss[loss=0.04748, simple_loss=0.06911, pruned_loss=0.005116, audio_tagging_loss=0.007807, over 15250.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08915, pruned_loss=0.01207, audio_tagging_loss=0.008593, over 3037967.82 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:01:01,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3696340.0, ans=0.125 2023-11-27 03:01:11,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3696406.6666666665, ans=0.0 2023-11-27 03:01:22,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3696473.3333333335, ans=0.125 2023-11-27 03:01:35,752 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:01:41,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3696606.6666666665, ans=0.125 2023-11-27 03:01:50,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554500 2023-11-27 03:01:53,140 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1400, loss[loss=0.0628, simple_loss=0.08333, pruned_loss=0.01165, audio_tagging_loss=0.009484, over 15706.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08909, pruned_loss=0.01194, audio_tagging_loss=0.008597, over 3040184.23 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:01:57,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.796e+01 9.481e+01 1.017e+02 1.266e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 03:01:57,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-27 03:01:59,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-27 03:01:59,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3696673.3333333335, ans=0.0 2023-11-27 03:02:02,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3696740.0, ans=0.2 2023-11-27 03:02:06,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3696740.0, ans=0.125 2023-11-27 03:02:06,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3696740.0, ans=0.125 2023-11-27 03:02:06,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3696740.0, ans=0.125 2023-11-27 03:02:08,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3696740.0, ans=0.09899494936611666 2023-11-27 03:02:35,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3696873.3333333335, ans=0.0 2023-11-27 03:02:37,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3696940.0, ans=0.0 2023-11-27 03:02:44,922 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554550 2023-11-27 03:02:48,013 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1450, loss[loss=0.06187, simple_loss=0.07942, pruned_loss=0.01106, audio_tagging_loss=0.01111, over 14429.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08946, pruned_loss=0.01202, audio_tagging_loss=0.00873, over 3029817.02 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:02:52,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3697006.6666666665, ans=0.0 2023-11-27 03:03:08,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2023-11-27 03:03:21,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3697206.6666666665, ans=0.125 2023-11-27 03:03:39,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-27 03:03:40,304 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554600 2023-11-27 03:03:43,657 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1500, loss[loss=0.06021, simple_loss=0.08229, pruned_loss=0.01022, audio_tagging_loss=0.008848, over 15372.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09001, pruned_loss=0.01215, audio_tagging_loss=0.008791, over 3042218.31 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:03:48,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 9.023e+01 9.880e+01 1.062e+02 1.307e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-27 03:03:55,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3697406.6666666665, ans=0.125 2023-11-27 03:03:58,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3697406.6666666665, ans=0.0 2023-11-27 03:03:58,223 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:03:59,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3697406.6666666665, ans=0.125 2023-11-27 03:04:19,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.44 vs. limit=10.0 2023-11-27 03:04:32,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.17 vs. limit=22.5 2023-11-27 03:04:36,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554650 2023-11-27 03:04:40,349 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1550, loss[loss=0.07721, simple_loss=0.1051, pruned_loss=0.01804, audio_tagging_loss=0.006613, over 14625.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08984, pruned_loss=0.0122, audio_tagging_loss=0.008868, over 3044054.03 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:04:41,621 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:04:53,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3697740.0, ans=0.1 2023-11-27 03:04:57,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3697740.0, ans=0.125 2023-11-27 03:05:32,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554700 2023-11-27 03:05:36,042 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1600, loss[loss=0.06883, simple_loss=0.09651, pruned_loss=0.0148, audio_tagging_loss=0.005772, over 14865.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08943, pruned_loss=0.01218, audio_tagging_loss=0.008994, over 3039744.75 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:05:41,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.979e+01 9.588e+01 1.025e+02 1.510e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 03:05:48,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3698073.3333333335, ans=0.2 2023-11-27 03:05:55,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3698073.3333333335, ans=0.125 2023-11-27 03:05:57,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3698140.0, ans=0.2 2023-11-27 03:05:57,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3698140.0, ans=0.0 2023-11-27 03:06:27,973 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554750 2023-11-27 03:06:31,093 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1650, loss[loss=0.07528, simple_loss=0.1009, pruned_loss=0.0142, audio_tagging_loss=0.01064, over 14621.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08917, pruned_loss=0.01205, audio_tagging_loss=0.008992, over 3050872.21 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:06:44,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3698406.6666666665, ans=0.2 2023-11-27 03:06:47,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-27 03:06:58,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3698473.3333333335, ans=0.125 2023-11-27 03:06:58,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3698473.3333333335, ans=0.0 2023-11-27 03:07:23,988 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554800 2023-11-27 03:07:27,472 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1700, loss[loss=0.05546, simple_loss=0.07368, pruned_loss=0.008877, audio_tagging_loss=0.009742, over 15102.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08978, pruned_loss=0.01217, audio_tagging_loss=0.009003, over 3047246.49 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:07:28,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3698673.3333333335, ans=0.0 2023-11-27 03:07:31,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3698673.3333333335, ans=0.125 2023-11-27 03:07:33,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.918e+01 9.493e+01 1.014e+02 1.179e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 03:07:47,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3698740.0, ans=0.125 2023-11-27 03:08:05,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2023-11-27 03:08:17,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2023-11-27 03:08:20,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554850 2023-11-27 03:08:23,999 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1750, loss[loss=0.07673, simple_loss=0.1118, pruned_loss=0.01308, audio_tagging_loss=0.007735, over 15728.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08982, pruned_loss=0.01198, audio_tagging_loss=0.008859, over 3046481.98 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:08:25,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699006.6666666665, ans=0.1 2023-11-27 03:08:39,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699073.3333333335, ans=0.1 2023-11-27 03:08:40,827 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2023-11-27 03:08:41,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3699073.3333333335, ans=10.0 2023-11-27 03:08:55,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3699140.0, ans=0.125 2023-11-27 03:09:13,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-27 03:09:16,560 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554900 2023-11-27 03:09:16,715 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:09:19,709 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1800, loss[loss=0.0484, simple_loss=0.0629, pruned_loss=0.007846, audio_tagging_loss=0.009104, over 14625.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.0895, pruned_loss=0.01189, audio_tagging_loss=0.00871, over 3049631.09 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:09:26,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.924e+01 9.583e+01 9.926e+01 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 03:09:45,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=12.0 2023-11-27 03:09:51,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699473.3333333335, ans=0.1 2023-11-27 03:10:12,785 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554950 2023-11-27 03:10:16,017 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1850, loss[loss=0.05772, simple_loss=0.08191, pruned_loss=0.007366, audio_tagging_loss=0.009399, over 14822.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09024, pruned_loss=0.01201, audio_tagging_loss=0.008582, over 3047507.77 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:10:21,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3699673.3333333335, ans=0.0 2023-11-27 03:10:23,302 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2023-11-27 03:10:38,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3699806.6666666665, ans=0.05 2023-11-27 03:10:53,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3699873.3333333335, ans=0.125 2023-11-27 03:10:59,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3699940.0, ans=0.2 2023-11-27 03:11:08,682 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555000 2023-11-27 03:11:11,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3700006.6666666665, ans=0.95 2023-11-27 03:11:12,131 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1900, loss[loss=0.06361, simple_loss=0.08029, pruned_loss=0.0121, audio_tagging_loss=0.01137, over 15789.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09002, pruned_loss=0.01197, audio_tagging_loss=0.008578, over 3051803.14 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:11:13,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3700006.6666666665, ans=0.125 2023-11-27 03:11:16,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3700006.6666666665, ans=0.125 2023-11-27 03:11:18,506 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 9.107e+01 9.832e+01 1.049e+02 1.489e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 03:11:21,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3700006.6666666665, ans=0.125 2023-11-27 03:11:23,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3700073.3333333335, ans=0.125 2023-11-27 03:11:23,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3700073.3333333335, ans=0.0 2023-11-27 03:12:05,085 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555050 2023-11-27 03:12:08,237 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1950, loss[loss=0.04043, simple_loss=0.04541, pruned_loss=0.003772, audio_tagging_loss=0.01395, over 16408.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08879, pruned_loss=0.01171, audio_tagging_loss=0.008704, over 3053593.86 frames. ], batch size: 65, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:12:14,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3700340.0, ans=0.1 2023-11-27 03:12:25,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3700406.6666666665, ans=0.05 2023-11-27 03:12:40,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-11-27 03:13:00,677 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555100 2023-11-27 03:13:04,283 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2000, loss[loss=0.06848, simple_loss=0.09029, pruned_loss=0.01445, audio_tagging_loss=0.008882, over 14813.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08789, pruned_loss=0.01166, audio_tagging_loss=0.008707, over 3047842.17 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:13:04,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3700673.3333333335, ans=0.0 2023-11-27 03:13:08,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3700673.3333333335, ans=0.125 2023-11-27 03:13:11,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.780e+01 9.356e+01 1.007e+02 1.266e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 03:13:11,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3700673.3333333335, ans=0.0 2023-11-27 03:13:30,502 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:13:46,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3700873.3333333335, ans=0.0 2023-11-27 03:13:57,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555150 2023-11-27 03:13:58,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3700940.0, ans=0.125 2023-11-27 03:14:00,288 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2050, loss[loss=0.04726, simple_loss=0.06357, pruned_loss=0.004687, audio_tagging_loss=0.01079, over 14711.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08831, pruned_loss=0.01167, audio_tagging_loss=0.008688, over 3047866.91 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:14:09,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3701006.6666666665, ans=0.1 2023-11-27 03:14:10,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3701073.3333333335, ans=0.0 2023-11-27 03:14:23,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3701140.0, ans=0.125 2023-11-27 03:14:45,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3701273.3333333335, ans=0.0 2023-11-27 03:14:47,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=22.5 2023-11-27 03:14:48,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2023-11-27 03:14:52,361 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555200 2023-11-27 03:14:55,682 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2100, loss[loss=0.06637, simple_loss=0.08853, pruned_loss=0.01124, audio_tagging_loss=0.01087, over 15491.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08854, pruned_loss=0.01174, audio_tagging_loss=0.008655, over 3039621.90 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:15:02,568 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.818e+01 9.814e+01 1.041e+02 1.368e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 03:15:12,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3701406.6666666665, ans=0.07 2023-11-27 03:15:29,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3701540.0, ans=0.0 2023-11-27 03:15:44,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3701606.6666666665, ans=0.125 2023-11-27 03:15:49,126 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555250 2023-11-27 03:15:52,296 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2150, loss[loss=0.06389, simple_loss=0.09037, pruned_loss=0.01077, audio_tagging_loss=0.007939, over 15855.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08807, pruned_loss=0.01171, audio_tagging_loss=0.008615, over 3047295.00 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:15:53,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3701673.3333333335, ans=0.0 2023-11-27 03:16:07,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3701740.0, ans=0.125 2023-11-27 03:16:24,273 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:16:31,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-27 03:16:45,561 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555300 2023-11-27 03:16:48,653 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2200, loss[loss=0.05783, simple_loss=0.07844, pruned_loss=0.01051, audio_tagging_loss=0.008103, over 15511.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08834, pruned_loss=0.01174, audio_tagging_loss=0.008569, over 3047197.77 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:16:55,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.053e+01 9.078e+01 9.706e+01 1.033e+02 2.180e+02, threshold=1.941e+02, percent-clipped=1.0 2023-11-27 03:16:59,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3702073.3333333335, ans=0.0 2023-11-27 03:17:07,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3702073.3333333335, ans=0.125 2023-11-27 03:17:08,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=22.5 2023-11-27 03:17:33,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3702273.3333333335, ans=0.125 2023-11-27 03:17:33,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3702273.3333333335, ans=0.2 2023-11-27 03:17:36,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3702273.3333333335, ans=0.0 2023-11-27 03:17:39,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702273.3333333335, ans=0.1 2023-11-27 03:17:40,544 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555350 2023-11-27 03:17:43,601 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2250, loss[loss=0.06827, simple_loss=0.0904, pruned_loss=0.01491, audio_tagging_loss=0.008161, over 15696.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08898, pruned_loss=0.01168, audio_tagging_loss=0.008607, over 3048662.26 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:17:44,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3702340.0, ans=0.0 2023-11-27 03:17:58,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2023-11-27 03:18:02,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3702406.6666666665, ans=0.0 2023-11-27 03:18:10,820 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2023-11-27 03:18:18,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3702540.0, ans=0.125 2023-11-27 03:18:35,796 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555400 2023-11-27 03:18:35,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3702606.6666666665, ans=0.0 2023-11-27 03:18:39,676 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2300, loss[loss=0.08219, simple_loss=0.122, pruned_loss=0.01723, audio_tagging_loss=0.003952, over 15708.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08961, pruned_loss=0.01192, audio_tagging_loss=0.008574, over 3049142.66 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:18:39,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3702673.3333333335, ans=0.125 2023-11-27 03:18:43,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3702673.3333333335, ans=0.125 2023-11-27 03:18:46,633 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 9.197e+01 9.925e+01 1.066e+02 1.274e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-27 03:18:48,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3702673.3333333335, ans=0.125 2023-11-27 03:19:15,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3702873.3333333335, ans=0.0 2023-11-27 03:19:18,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3702873.3333333335, ans=0.125 2023-11-27 03:19:27,956 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:19:32,266 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555450 2023-11-27 03:19:35,988 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2350, loss[loss=0.08494, simple_loss=0.11, pruned_loss=0.01797, audio_tagging_loss=0.01194, over 14444.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08896, pruned_loss=0.01186, audio_tagging_loss=0.008688, over 3038956.75 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:19:39,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3703006.6666666665, ans=0.125 2023-11-27 03:19:40,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3703006.6666666665, ans=0.125 2023-11-27 03:19:41,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3703006.6666666665, ans=0.0 2023-11-27 03:19:59,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3703140.0, ans=0.0 2023-11-27 03:20:13,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2023-11-27 03:20:27,932 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555500 2023-11-27 03:20:29,187 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:20:31,098 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2400, loss[loss=0.05485, simple_loss=0.0759, pruned_loss=0.007597, audio_tagging_loss=0.009301, over 15899.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08856, pruned_loss=0.01191, audio_tagging_loss=0.008795, over 3042227.85 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:20:37,435 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.851e+01 9.612e+01 1.018e+02 1.276e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 03:20:48,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3703406.6666666665, ans=0.0 2023-11-27 03:21:00,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-27 03:21:05,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3703540.0, ans=0.125 2023-11-27 03:21:23,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555550 2023-11-27 03:21:26,360 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2450, loss[loss=0.06761, simple_loss=0.08652, pruned_loss=0.01199, audio_tagging_loss=0.01236, over 14983.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08783, pruned_loss=0.01184, audio_tagging_loss=0.008869, over 3042958.62 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:21:27,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=22.5 2023-11-27 03:21:30,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3703673.3333333335, ans=0.0 2023-11-27 03:21:41,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-27 03:21:47,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3703740.0, ans=0.0 2023-11-27 03:21:48,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3703806.6666666665, ans=0.125 2023-11-27 03:22:04,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3703873.3333333335, ans=0.5 2023-11-27 03:22:12,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2023-11-27 03:22:14,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3703940.0, ans=0.0 2023-11-27 03:22:15,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3703940.0, ans=0.2 2023-11-27 03:22:19,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555600 2023-11-27 03:22:23,110 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2500, loss[loss=0.07015, simple_loss=0.08264, pruned_loss=0.01775, audio_tagging_loss=0.01108, over 16488.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08806, pruned_loss=0.01174, audio_tagging_loss=0.008866, over 3037836.35 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:22:25,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3704006.6666666665, ans=0.0 2023-11-27 03:22:30,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 9.086e+01 9.685e+01 1.022e+02 1.331e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 03:22:32,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-27 03:22:55,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3704206.6666666665, ans=0.2 2023-11-27 03:23:02,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3704206.6666666665, ans=0.2 2023-11-27 03:23:15,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555650 2023-11-27 03:23:18,857 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2550, loss[loss=0.05953, simple_loss=0.08244, pruned_loss=0.01005, audio_tagging_loss=0.008263, over 14671.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08842, pruned_loss=0.01172, audio_tagging_loss=0.008788, over 3041726.76 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:23:21,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3704340.0, ans=0.1 2023-11-27 03:23:29,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3704406.6666666665, ans=0.125 2023-11-27 03:23:32,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.13 vs. limit=22.5 2023-11-27 03:23:44,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2023-11-27 03:23:51,058 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:23:53,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3704540.0, ans=0.125 2023-11-27 03:24:11,001 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555700 2023-11-27 03:24:11,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3704606.6666666665, ans=0.05 2023-11-27 03:24:14,131 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2600, loss[loss=0.06607, simple_loss=0.09171, pruned_loss=0.01175, audio_tagging_loss=0.008459, over 15475.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08797, pruned_loss=0.01166, audio_tagging_loss=0.008757, over 3043716.41 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:24:22,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 9.053e+01 9.535e+01 1.024e+02 1.234e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 03:24:23,733 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-27 03:24:55,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3704873.3333333335, ans=0.0 2023-11-27 03:25:07,321 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555750 2023-11-27 03:25:10,350 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2650, loss[loss=0.06516, simple_loss=0.08509, pruned_loss=0.01151, audio_tagging_loss=0.0111, over 15037.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.0879, pruned_loss=0.01178, audio_tagging_loss=0.008693, over 3043443.15 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:25:15,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3705006.6666666665, ans=0.0 2023-11-27 03:25:36,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3705140.0, ans=0.125 2023-11-27 03:25:36,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2023-11-27 03:25:47,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3705206.6666666665, ans=0.0 2023-11-27 03:26:02,911 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555800 2023-11-27 03:26:06,304 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2700, loss[loss=0.08654, simple_loss=0.128, pruned_loss=0.01523, audio_tagging_loss=0.007302, over 16851.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08859, pruned_loss=0.01197, audio_tagging_loss=0.008638, over 3043710.14 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:26:13,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 9.070e+01 9.755e+01 1.047e+02 1.495e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 03:26:19,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3705406.6666666665, ans=0.0 2023-11-27 03:26:29,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3705473.3333333335, ans=0.125 2023-11-27 03:26:40,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3705540.0, ans=0.125 2023-11-27 03:26:43,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3705540.0, ans=0.125 2023-11-27 03:26:58,382 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555850 2023-11-27 03:27:01,515 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2750, loss[loss=0.07024, simple_loss=0.09156, pruned_loss=0.01606, audio_tagging_loss=0.008405, over 15749.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08894, pruned_loss=0.01188, audio_tagging_loss=0.008583, over 3047779.35 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:27:02,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3705673.3333333335, ans=0.125 2023-11-27 03:27:07,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3705673.3333333335, ans=0.125 2023-11-27 03:27:16,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3705740.0, ans=0.125 2023-11-27 03:27:49,882 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:27:52,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3705940.0, ans=0.125 2023-11-27 03:27:54,260 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555900 2023-11-27 03:27:58,010 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2800, loss[loss=0.06514, simple_loss=0.09099, pruned_loss=0.01147, audio_tagging_loss=0.008181, over 14930.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08901, pruned_loss=0.01188, audio_tagging_loss=0.008582, over 3043355.96 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:28:05,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.970e+01 9.604e+01 1.036e+02 1.276e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 03:28:19,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3706140.0, ans=0.125 2023-11-27 03:28:29,512 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2023-11-27 03:28:32,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3706206.6666666665, ans=0.125 2023-11-27 03:28:50,582 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555950 2023-11-27 03:28:50,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3706273.3333333335, ans=0.125 2023-11-27 03:28:51,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-27 03:28:54,313 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2850, loss[loss=0.06196, simple_loss=0.08323, pruned_loss=0.01156, audio_tagging_loss=0.008779, over 15502.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.0899, pruned_loss=0.01206, audio_tagging_loss=0.008446, over 3037287.22 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:29:12,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3706406.6666666665, ans=0.04949747468305833 2023-11-27 03:29:24,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.21 vs. limit=10.0 2023-11-27 03:29:39,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-11-27 03:29:40,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3706606.6666666665, ans=0.125 2023-11-27 03:29:46,425 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556000 2023-11-27 03:29:46,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3706606.6666666665, ans=0.0 2023-11-27 03:29:52,003 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2900, loss[loss=0.06032, simple_loss=0.08392, pruned_loss=0.01055, audio_tagging_loss=0.007807, over 15103.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08901, pruned_loss=0.01197, audio_tagging_loss=0.008498, over 3036242.14 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:29:57,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-27 03:29:59,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.856e+01 9.574e+01 1.046e+02 1.351e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 03:30:10,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3706740.0, ans=0.125 2023-11-27 03:30:17,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-11-27 03:30:20,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3706806.6666666665, ans=0.1 2023-11-27 03:30:44,342 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556050 2023-11-27 03:30:47,477 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2950, loss[loss=0.05834, simple_loss=0.0679, pruned_loss=0.01335, audio_tagging_loss=0.01104, over 15500.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08929, pruned_loss=0.01209, audio_tagging_loss=0.008543, over 3036798.62 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:30:50,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3707006.6666666665, ans=0.1 2023-11-27 03:31:28,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2023-11-27 03:31:40,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556100 2023-11-27 03:31:44,034 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3000, loss[loss=0.08026, simple_loss=0.1021, pruned_loss=0.01842, audio_tagging_loss=0.01078, over 15817.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08938, pruned_loss=0.01207, audio_tagging_loss=0.008586, over 3038713.36 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:31:44,035 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 03:32:05,524 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2217, 3.9927, 3.7483, 3.2968], device='cuda:2') 2023-11-27 03:32:16,620 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05735, simple_loss=0.05053, pruned_loss=0.005352, audio_tagging_loss=0.02673, over 4681554.00 frames. 2023-11-27 03:32:16,621 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 03:32:19,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-27 03:32:22,971 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2023-11-27 03:32:24,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-27 03:32:25,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 9.179e+01 9.770e+01 1.041e+02 1.490e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-27 03:32:42,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3707473.3333333335, ans=0.125 2023-11-27 03:33:09,518 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556150 2023-11-27 03:33:13,152 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3050, loss[loss=0.05006, simple_loss=0.06175, pruned_loss=0.0087, audio_tagging_loss=0.01049, over 14866.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08996, pruned_loss=0.0121, audio_tagging_loss=0.008591, over 3033224.36 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:33:13,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3707673.3333333335, ans=0.2 2023-11-27 03:33:27,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3707740.0, ans=0.125 2023-11-27 03:33:33,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.60 vs. limit=22.5 2023-11-27 03:33:45,030 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:33:51,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=15.0 2023-11-27 03:34:05,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556200 2023-11-27 03:34:07,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3707940.0, ans=0.0 2023-11-27 03:34:09,355 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3100, loss[loss=0.0555, simple_loss=0.07879, pruned_loss=0.007752, audio_tagging_loss=0.008357, over 13853.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09003, pruned_loss=0.01204, audio_tagging_loss=0.008561, over 3034637.80 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:34:14,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3708006.6666666665, ans=0.0 2023-11-27 03:34:17,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 9.052e+01 9.705e+01 1.059e+02 1.500e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 03:34:19,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3708073.3333333335, ans=0.1 2023-11-27 03:34:20,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3708073.3333333335, ans=0.1 2023-11-27 03:34:34,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3708140.0, ans=0.1 2023-11-27 03:34:47,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2023-11-27 03:35:01,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556250 2023-11-27 03:35:05,238 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3150, loss[loss=0.0736, simple_loss=0.09708, pruned_loss=0.0164, audio_tagging_loss=0.008659, over 15101.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08976, pruned_loss=0.01194, audio_tagging_loss=0.008602, over 3035181.26 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:35:23,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3708406.6666666665, ans=0.125 2023-11-27 03:35:24,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3708406.6666666665, ans=0.0 2023-11-27 03:35:37,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3708473.3333333335, ans=0.125 2023-11-27 03:35:43,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3708540.0, ans=0.0 2023-11-27 03:35:44,932 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2023-11-27 03:35:58,268 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556300 2023-11-27 03:35:59,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3708606.6666666665, ans=0.05 2023-11-27 03:36:01,313 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3200, loss[loss=0.09041, simple_loss=0.1158, pruned_loss=0.02055, audio_tagging_loss=0.01197, over 14642.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08932, pruned_loss=0.01182, audio_tagging_loss=0.008739, over 3037220.43 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:36:08,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3708673.3333333335, ans=0.125 2023-11-27 03:36:10,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.966e+01 9.584e+01 1.017e+02 1.282e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 03:36:17,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3708740.0, ans=15.0 2023-11-27 03:36:20,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3708740.0, ans=0.125 2023-11-27 03:36:40,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3708873.3333333335, ans=0.2 2023-11-27 03:36:54,325 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556350 2023-11-27 03:36:57,422 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3250, loss[loss=0.05309, simple_loss=0.06353, pruned_loss=0.01041, audio_tagging_loss=0.01092, over 14211.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08881, pruned_loss=0.01171, audio_tagging_loss=0.008801, over 3030457.96 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:37:00,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3709006.6666666665, ans=0.0 2023-11-27 03:37:01,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=15.0 2023-11-27 03:37:11,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3709073.3333333335, ans=0.1 2023-11-27 03:37:15,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2023-11-27 03:37:17,397 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:37:24,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3709140.0, ans=0.2 2023-11-27 03:37:28,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3709140.0, ans=0.0 2023-11-27 03:37:47,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3709273.3333333335, ans=0.125 2023-11-27 03:37:49,590 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556400 2023-11-27 03:37:52,935 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3300, loss[loss=0.08663, simple_loss=0.1205, pruned_loss=0.0197, audio_tagging_loss=0.006681, over 15288.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08839, pruned_loss=0.01156, audio_tagging_loss=0.008757, over 3036630.84 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:37:56,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3709340.0, ans=0.0 2023-11-27 03:37:58,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3709340.0, ans=0.125 2023-11-27 03:38:02,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 9.106e+01 9.727e+01 1.041e+02 1.146e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 03:38:11,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=22.5 2023-11-27 03:38:13,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2023-11-27 03:38:15,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3709473.3333333335, ans=0.1 2023-11-27 03:38:16,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2023-11-27 03:38:26,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3709540.0, ans=0.2 2023-11-27 03:38:37,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3709606.6666666665, ans=0.125 2023-11-27 03:38:39,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3709606.6666666665, ans=0.125 2023-11-27 03:38:45,552 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556450 2023-11-27 03:38:49,267 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3350, loss[loss=0.06037, simple_loss=0.08334, pruned_loss=0.007544, audio_tagging_loss=0.01115, over 15331.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08867, pruned_loss=0.01161, audio_tagging_loss=0.008685, over 3040701.71 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:38:58,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3709673.3333333335, ans=0.125 2023-11-27 03:39:06,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3709740.0, ans=0.2 2023-11-27 03:39:09,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2023-11-27 03:39:31,456 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-11-27 03:39:42,801 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556500 2023-11-27 03:39:45,909 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3400, loss[loss=0.06557, simple_loss=0.08964, pruned_loss=0.01033, audio_tagging_loss=0.01042, over 16119.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08996, pruned_loss=0.01191, audio_tagging_loss=0.008527, over 3033821.33 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:39:51,438 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:39:54,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3710006.6666666665, ans=0.2 2023-11-27 03:39:54,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3710006.6666666665, ans=0.0 2023-11-27 03:39:55,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 9.012e+01 9.564e+01 1.021e+02 1.293e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 03:40:00,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3710073.3333333335, ans=0.2 2023-11-27 03:40:02,926 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:40:04,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3710073.3333333335, ans=0.2 2023-11-27 03:40:14,735 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=22.5 2023-11-27 03:40:21,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-27 03:40:25,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-27 03:40:38,002 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556550 2023-11-27 03:40:41,078 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3450, loss[loss=0.05528, simple_loss=0.08501, pruned_loss=0.007549, audio_tagging_loss=0.005232, over 14485.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08978, pruned_loss=0.01192, audio_tagging_loss=0.008492, over 3039912.19 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:40:42,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3710340.0, ans=0.125 2023-11-27 03:40:43,459 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:40:57,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710406.6666666665, ans=0.1 2023-11-27 03:41:00,072 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-27 03:41:14,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3710540.0, ans=0.125 2023-11-27 03:41:25,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3710606.6666666665, ans=0.0 2023-11-27 03:41:25,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3710606.6666666665, ans=0.0 2023-11-27 03:41:27,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3710606.6666666665, ans=0.125 2023-11-27 03:41:32,642 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556600 2023-11-27 03:41:36,599 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3500, loss[loss=0.06934, simple_loss=0.09272, pruned_loss=0.01513, audio_tagging_loss=0.007845, over 14262.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08991, pruned_loss=0.01198, audio_tagging_loss=0.008455, over 3033213.79 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:41:47,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.850e+01 9.448e+01 1.007e+02 1.285e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 03:42:05,879 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:42:14,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710873.3333333335, ans=0.1 2023-11-27 03:42:29,905 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556650 2023-11-27 03:42:31,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2023-11-27 03:42:33,578 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3550, loss[loss=0.07554, simple_loss=0.108, pruned_loss=0.01312, audio_tagging_loss=0.008417, over 16636.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08945, pruned_loss=0.01198, audio_tagging_loss=0.008488, over 3038642.82 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:42:46,653 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:42:52,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3711073.3333333335, ans=10.0 2023-11-27 03:42:55,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2023-11-27 03:42:57,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3711140.0, ans=0.0 2023-11-27 03:43:02,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3711140.0, ans=0.125 2023-11-27 03:43:05,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3711206.6666666665, ans=0.07 2023-11-27 03:43:17,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=12.0 2023-11-27 03:43:25,857 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556700 2023-11-27 03:43:29,041 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3600, loss[loss=0.06762, simple_loss=0.09311, pruned_loss=0.01395, audio_tagging_loss=0.007107, over 15411.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08993, pruned_loss=0.01212, audio_tagging_loss=0.008503, over 3041710.85 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:43:38,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.614e+01 9.369e+01 1.008e+02 1.433e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 03:43:43,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3711406.6666666665, ans=0.125 2023-11-27 03:43:49,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-27 03:43:59,960 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=12.0 2023-11-27 03:44:01,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3711540.0, ans=0.125 2023-11-27 03:44:20,563 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556750 2023-11-27 03:44:23,827 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3650, loss[loss=0.06666, simple_loss=0.09051, pruned_loss=0.01137, audio_tagging_loss=0.01004, over 14818.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08898, pruned_loss=0.01202, audio_tagging_loss=0.008552, over 3038346.52 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:44:36,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3711740.0, ans=0.1 2023-11-27 03:44:36,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3711740.0, ans=0.125 2023-11-27 03:44:44,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2023-11-27 03:45:17,588 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556800 2023-11-27 03:45:20,974 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3700, loss[loss=0.06131, simple_loss=0.08368, pruned_loss=0.008946, audio_tagging_loss=0.01052, over 15568.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08985, pruned_loss=0.0121, audio_tagging_loss=0.008509, over 3046024.69 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:45:27,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3712006.6666666665, ans=0.5 2023-11-27 03:45:28,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-27 03:45:31,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 9.060e+01 9.619e+01 1.026e+02 1.251e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-27 03:45:31,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3712073.3333333335, ans=0.1 2023-11-27 03:45:42,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3712140.0, ans=0.0 2023-11-27 03:45:57,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3712206.6666666665, ans=0.0 2023-11-27 03:45:57,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3712206.6666666665, ans=0.0 2023-11-27 03:45:58,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3712206.6666666665, ans=0.2 2023-11-27 03:46:12,788 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:46:13,710 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556850 2023-11-27 03:46:16,765 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3750, loss[loss=0.06967, simple_loss=0.09111, pruned_loss=0.01491, audio_tagging_loss=0.009203, over 15518.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08982, pruned_loss=0.01199, audio_tagging_loss=0.008597, over 3045713.41 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:46:19,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3712340.0, ans=0.0 2023-11-27 03:46:34,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3712406.6666666665, ans=0.2 2023-11-27 03:46:48,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3712473.3333333335, ans=0.2 2023-11-27 03:46:54,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3712540.0, ans=0.0 2023-11-27 03:46:54,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3712540.0, ans=0.025 2023-11-27 03:46:54,985 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:46:59,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3712540.0, ans=0.1 2023-11-27 03:47:07,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-27 03:47:08,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556900 2023-11-27 03:47:09,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3712606.6666666665, ans=0.125 2023-11-27 03:47:10,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3712606.6666666665, ans=0.125 2023-11-27 03:47:12,069 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3800, loss[loss=0.05837, simple_loss=0.07585, pruned_loss=0.01177, audio_tagging_loss=0.008677, over 14410.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09012, pruned_loss=0.01209, audio_tagging_loss=0.008691, over 3050754.71 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:47:17,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3712673.3333333335, ans=0.02 2023-11-27 03:47:24,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.007e+01 9.077e+01 9.693e+01 1.049e+02 1.287e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 03:47:46,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.61 vs. limit=22.5 2023-11-27 03:48:00,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-27 03:48:02,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3712940.0, ans=0.1 2023-11-27 03:48:05,203 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556950 2023-11-27 03:48:08,323 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3850, loss[loss=0.06221, simple_loss=0.08632, pruned_loss=0.0094, audio_tagging_loss=0.009645, over 15089.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09024, pruned_loss=0.01188, audio_tagging_loss=0.008621, over 3059325.30 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:48:19,534 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-27 03:48:27,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3713073.3333333335, ans=0.125 2023-11-27 03:48:29,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3713140.0, ans=0.125 2023-11-27 03:48:36,512 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-27 03:48:47,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3713206.6666666665, ans=0.0 2023-11-27 03:48:53,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3713273.3333333335, ans=0.1 2023-11-27 03:48:56,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-11-27 03:48:59,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2023-11-27 03:49:01,539 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557000 2023-11-27 03:49:05,017 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3900, loss[loss=0.07064, simple_loss=0.08079, pruned_loss=0.01816, audio_tagging_loss=0.01208, over 15839.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08946, pruned_loss=0.012, audio_tagging_loss=0.008759, over 3051020.68 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:49:07,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3713340.0, ans=0.125 2023-11-27 03:49:07,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3713340.0, ans=0.125 2023-11-27 03:49:11,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3713340.0, ans=0.0 2023-11-27 03:49:16,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 9.076e+01 9.575e+01 1.020e+02 1.197e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 03:49:22,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3713406.6666666665, ans=0.125 2023-11-27 03:49:44,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3713540.0, ans=0.035 2023-11-27 03:49:46,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3713540.0, ans=0.2 2023-11-27 03:49:57,108 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557050 2023-11-27 03:50:00,205 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3950, loss[loss=0.07134, simple_loss=0.09568, pruned_loss=0.01271, audio_tagging_loss=0.01079, over 15133.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.0903, pruned_loss=0.01218, audio_tagging_loss=0.008755, over 3059055.38 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:50:04,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3713673.3333333335, ans=0.125 2023-11-27 03:50:24,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3713806.6666666665, ans=0.1 2023-11-27 03:50:35,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3713873.3333333335, ans=6.0 2023-11-27 03:50:52,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557100 2023-11-27 03:50:56,110 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4000, loss[loss=0.06335, simple_loss=0.08271, pruned_loss=0.01078, audio_tagging_loss=0.01121, over 14808.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09063, pruned_loss=0.01213, audio_tagging_loss=0.008768, over 3053462.39 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:51:08,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.327e+01 9.767e+01 1.033e+02 1.414e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 03:51:34,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3714206.6666666665, ans=0.125 2023-11-27 03:51:38,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3714206.6666666665, ans=0.125 2023-11-27 03:51:48,483 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557150 2023-11-27 03:51:52,052 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4050, loss[loss=0.0556, simple_loss=0.08337, pruned_loss=0.008029, audio_tagging_loss=0.005889, over 15755.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09071, pruned_loss=0.01227, audio_tagging_loss=0.008742, over 3049590.13 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:51:53,584 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2023-11-27 03:51:54,212 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:52:44,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557200 2023-11-27 03:52:47,541 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4100, loss[loss=0.09907, simple_loss=0.139, pruned_loss=0.02408, audio_tagging_loss=0.005495, over 14800.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09066, pruned_loss=0.01216, audio_tagging_loss=0.008787, over 3053422.59 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:52:47,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3714673.3333333335, ans=0.125 2023-11-27 03:52:58,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3714740.0, ans=0.125 2023-11-27 03:52:59,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 9.062e+01 9.739e+01 1.030e+02 1.331e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:53:04,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3714740.0, ans=0.1 2023-11-27 03:53:08,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3714740.0, ans=0.125 2023-11-27 03:53:40,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557250 2023-11-27 03:53:40,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3714940.0, ans=0.0 2023-11-27 03:53:43,504 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4150, loss[loss=0.05333, simple_loss=0.07003, pruned_loss=0.006762, audio_tagging_loss=0.01155, over 14083.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09062, pruned_loss=0.01214, audio_tagging_loss=0.008718, over 3045768.50 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:53:48,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3715006.6666666665, ans=0.0 2023-11-27 03:54:23,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3715206.6666666665, ans=0.125 2023-11-27 03:54:24,029 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:54:24,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3715206.6666666665, ans=0.1 2023-11-27 03:54:27,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3715273.3333333335, ans=0.09899494936611666 2023-11-27 03:54:33,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3715273.3333333335, ans=0.125 2023-11-27 03:54:36,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557300 2023-11-27 03:54:39,856 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4200, loss[loss=0.06261, simple_loss=0.0922, pruned_loss=0.01024, audio_tagging_loss=0.006277, over 15382.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08936, pruned_loss=0.01199, audio_tagging_loss=0.008663, over 3053577.43 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:54:40,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3715340.0, ans=0.125 2023-11-27 03:54:44,589 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2023-11-27 03:54:51,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.957e+01 9.619e+01 1.045e+02 2.364e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-27 03:55:00,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3715473.3333333335, ans=0.125 2023-11-27 03:55:20,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2023-11-27 03:55:22,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3715540.0, ans=0.2 2023-11-27 03:55:22,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3715540.0, ans=0.04949747468305833 2023-11-27 03:55:31,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3715606.6666666665, ans=0.0 2023-11-27 03:55:32,777 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557350 2023-11-27 03:55:35,908 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4250, loss[loss=0.06504, simple_loss=0.08935, pruned_loss=0.0112, audio_tagging_loss=0.009167, over 14892.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08949, pruned_loss=0.01188, audio_tagging_loss=0.008565, over 3054623.69 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:55:40,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3715673.3333333335, ans=0.125 2023-11-27 03:55:41,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3715673.3333333335, ans=0.125 2023-11-27 03:55:44,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3715673.3333333335, ans=0.125 2023-11-27 03:55:59,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3715806.6666666665, ans=0.2 2023-11-27 03:55:59,449 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-27 03:56:13,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3715873.3333333335, ans=0.2 2023-11-27 03:56:14,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3715873.3333333335, ans=0.125 2023-11-27 03:56:14,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2023-11-27 03:56:17,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3715873.3333333335, ans=0.125 2023-11-27 03:56:17,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3715873.3333333335, ans=0.09899494936611666 2023-11-27 03:56:21,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-27 03:56:28,278 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557400 2023-11-27 03:56:31,982 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4300, loss[loss=0.05246, simple_loss=0.06946, pruned_loss=0.00916, audio_tagging_loss=0.008576, over 14666.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08924, pruned_loss=0.01174, audio_tagging_loss=0.008504, over 3051308.34 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:56:39,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3716006.6666666665, ans=0.125 2023-11-27 03:56:43,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3716073.3333333335, ans=0.2 2023-11-27 03:56:44,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 9.063e+01 9.742e+01 1.048e+02 1.434e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:56:49,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3716073.3333333335, ans=0.125 2023-11-27 03:57:01,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3716140.0, ans=0.0 2023-11-27 03:57:06,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3716206.6666666665, ans=0.125 2023-11-27 03:57:09,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3716206.6666666665, ans=0.1 2023-11-27 03:57:14,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3716206.6666666665, ans=0.0 2023-11-27 03:57:15,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3716273.3333333335, ans=0.125 2023-11-27 03:57:24,953 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557450 2023-11-27 03:57:28,057 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4350, loss[loss=0.06546, simple_loss=0.0906, pruned_loss=0.01094, audio_tagging_loss=0.009223, over 15237.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08965, pruned_loss=0.01171, audio_tagging_loss=0.008435, over 3050084.41 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:57:32,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2023-11-27 03:57:33,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3716340.0, ans=0.125 2023-11-27 03:57:36,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3716340.0, ans=0.2 2023-11-27 03:57:58,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2023-11-27 03:58:20,054 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557500 2023-11-27 03:58:23,188 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4400, loss[loss=0.06607, simple_loss=0.08449, pruned_loss=0.01417, audio_tagging_loss=0.009657, over 15435.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09056, pruned_loss=0.01205, audio_tagging_loss=0.008373, over 3050345.99 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:58:34,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3716740.0, ans=0.125 2023-11-27 03:58:35,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 9.094e+01 9.740e+01 1.025e+02 1.251e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:58:58,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3716873.3333333335, ans=0.0 2023-11-27 03:58:59,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3716873.3333333335, ans=0.125 2023-11-27 03:59:15,452 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557550 2023-11-27 03:59:18,597 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4450, loss[loss=0.06038, simple_loss=0.08583, pruned_loss=0.008123, audio_tagging_loss=0.009346, over 15839.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09128, pruned_loss=0.01215, audio_tagging_loss=0.008387, over 3057077.62 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:59:39,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3717073.3333333335, ans=0.0 2023-11-27 04:00:11,595 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557600 2023-11-27 04:00:15,473 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4500, loss[loss=0.09446, simple_loss=0.1404, pruned_loss=0.01979, audio_tagging_loss=0.004501, over 15805.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09151, pruned_loss=0.01216, audio_tagging_loss=0.008269, over 3059616.51 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:00:27,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 9.128e+01 9.724e+01 1.026e+02 1.221e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 04:00:37,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2023-11-27 04:00:48,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3717540.0, ans=0.0 2023-11-27 04:00:49,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3717540.0, ans=0.125 2023-11-27 04:00:53,580 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:00:58,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3717540.0, ans=0.05 2023-11-27 04:01:02,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3717606.6666666665, ans=0.0 2023-11-27 04:01:07,864 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557650 2023-11-27 04:01:11,044 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4550, loss[loss=0.07132, simple_loss=0.104, pruned_loss=0.01412, audio_tagging_loss=0.005203, over 15333.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.09022, pruned_loss=0.01197, audio_tagging_loss=0.008423, over 3058704.20 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:01:19,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3717673.3333333335, ans=10.0 2023-11-27 04:01:23,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3717740.0, ans=0.0 2023-11-27 04:01:27,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-11-27 04:01:53,957 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:01:55,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3717940.0, ans=0.04949747468305833 2023-11-27 04:01:56,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3717940.0, ans=0.125 2023-11-27 04:02:01,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3717940.0, ans=0.125 2023-11-27 04:02:03,768 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557700 2023-11-27 04:02:07,407 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4600, loss[loss=0.06978, simple_loss=0.09484, pruned_loss=0.01426, audio_tagging_loss=0.008102, over 13750.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08965, pruned_loss=0.01211, audio_tagging_loss=0.008589, over 3050086.63 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:02:18,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3718073.3333333335, ans=0.125 2023-11-27 04:02:19,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3718073.3333333335, ans=0.0 2023-11-27 04:02:20,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.735e+01 9.058e+01 9.556e+01 1.027e+02 1.489e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-27 04:02:43,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3718206.6666666665, ans=0.125 2023-11-27 04:02:53,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3718273.3333333335, ans=0.125 2023-11-27 04:03:00,549 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557750 2023-11-27 04:03:04,146 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4650, loss[loss=0.08451, simple_loss=0.1133, pruned_loss=0.02086, audio_tagging_loss=0.007004, over 15279.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08958, pruned_loss=0.01221, audio_tagging_loss=0.008726, over 3051966.33 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:03:05,879 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-27 04:03:20,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3718406.6666666665, ans=0.125 2023-11-27 04:03:56,138 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557800 2023-11-27 04:03:56,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3718606.6666666665, ans=0.09899494936611666 2023-11-27 04:03:56,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=12.0 2023-11-27 04:03:59,559 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4700, loss[loss=0.05602, simple_loss=0.08303, pruned_loss=0.005779, audio_tagging_loss=0.008729, over 15185.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08927, pruned_loss=0.01217, audio_tagging_loss=0.008714, over 3051808.41 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:04:05,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-11-27 04:04:07,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3718673.3333333335, ans=0.1 2023-11-27 04:04:12,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 9.072e+01 9.943e+01 1.043e+02 1.382e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-27 04:04:13,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3718740.0, ans=0.2 2023-11-27 04:04:18,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3718740.0, ans=0.0 2023-11-27 04:04:24,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3718806.6666666665, ans=0.125 2023-11-27 04:04:36,834 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-27 04:04:51,008 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557850 2023-11-27 04:04:51,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3718940.0, ans=0.0 2023-11-27 04:04:54,123 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4750, loss[loss=0.06374, simple_loss=0.0827, pruned_loss=0.01111, audio_tagging_loss=0.01128, over 15661.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08821, pruned_loss=0.01188, audio_tagging_loss=0.008816, over 3042469.41 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:05:12,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3719073.3333333335, ans=0.1 2023-11-27 04:05:16,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3719140.0, ans=0.125 2023-11-27 04:05:23,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3719140.0, ans=0.125 2023-11-27 04:05:47,627 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557900 2023-11-27 04:05:50,732 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4800, loss[loss=0.06867, simple_loss=0.09181, pruned_loss=0.009429, audio_tagging_loss=0.01334, over 15120.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08848, pruned_loss=0.01184, audio_tagging_loss=0.008842, over 3040046.51 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:05:56,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3719340.0, ans=0.025 2023-11-27 04:06:05,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.052e+01 9.526e+01 1.032e+02 1.738e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 04:06:22,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3719540.0, ans=0.2 2023-11-27 04:06:31,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3719540.0, ans=10.0 2023-11-27 04:06:42,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3719606.6666666665, ans=0.125 2023-11-27 04:06:43,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557950 2023-11-27 04:06:46,395 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4850, loss[loss=0.07572, simple_loss=0.1113, pruned_loss=0.01252, audio_tagging_loss=0.007538, over 15634.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08777, pruned_loss=0.01182, audio_tagging_loss=0.00897, over 3042010.05 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:06:59,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3719740.0, ans=0.0 2023-11-27 04:07:00,530 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:07:01,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3719740.0, ans=0.125 2023-11-27 04:07:06,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3719740.0, ans=0.125 2023-11-27 04:07:15,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3719806.6666666665, ans=0.125 2023-11-27 04:07:38,010 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558000 2023-11-27 04:07:41,443 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4900, loss[loss=0.06908, simple_loss=0.09567, pruned_loss=0.01288, audio_tagging_loss=0.008368, over 16973.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08778, pruned_loss=0.01194, audio_tagging_loss=0.008951, over 3043591.35 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:07:48,458 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:07:54,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-11-27 04:07:56,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.861e+01 9.533e+01 1.009e+02 1.253e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 04:08:14,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3720206.6666666665, ans=0.0 2023-11-27 04:08:33,856 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558050 2023-11-27 04:08:37,502 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4950, loss[loss=0.05209, simple_loss=0.06464, pruned_loss=0.007774, audio_tagging_loss=0.012, over 14297.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08809, pruned_loss=0.01194, audio_tagging_loss=0.008792, over 3041862.65 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:08:46,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3720340.0, ans=0.125 2023-11-27 04:08:58,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3720473.3333333335, ans=0.125 2023-11-27 04:09:02,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3720473.3333333335, ans=0.0 2023-11-27 04:09:31,055 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558100 2023-11-27 04:09:34,160 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5000, loss[loss=0.06836, simple_loss=0.09319, pruned_loss=0.01089, audio_tagging_loss=0.01087, over 15463.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08812, pruned_loss=0.01169, audio_tagging_loss=0.008621, over 3039559.18 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:09:34,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3720673.3333333335, ans=0.125 2023-11-27 04:09:43,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2023-11-27 04:09:47,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.780e+01 9.541e+01 1.005e+02 1.452e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 04:09:50,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3720740.0, ans=0.1 2023-11-27 04:10:09,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3720873.3333333335, ans=0.0 2023-11-27 04:10:16,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3720873.3333333335, ans=0.125 2023-11-27 04:10:25,988 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558150 2023-11-27 04:10:26,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3720940.0, ans=0.1 2023-11-27 04:10:29,095 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5050, loss[loss=0.06002, simple_loss=0.08667, pruned_loss=0.0106, audio_tagging_loss=0.006086, over 13951.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08863, pruned_loss=0.01181, audio_tagging_loss=0.008469, over 3039419.96 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:10:30,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3721006.6666666665, ans=0.125 2023-11-27 04:10:54,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3721140.0, ans=0.95 2023-11-27 04:11:00,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3721140.0, ans=0.125 2023-11-27 04:11:21,941 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558200 2023-11-27 04:11:25,285 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5100, loss[loss=0.04344, simple_loss=0.05447, pruned_loss=0.007053, audio_tagging_loss=0.009149, over 15114.00 frames. ], tot_loss[loss=0.06409, simple_loss=0.0878, pruned_loss=0.01168, audio_tagging_loss=0.008514, over 3039714.87 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:11:40,665 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.877e+01 9.521e+01 1.045e+02 1.362e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:11:41,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3721406.6666666665, ans=0.125 2023-11-27 04:11:42,130 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2023-11-27 04:11:44,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3721406.6666666665, ans=0.125 2023-11-27 04:12:01,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3721540.0, ans=0.125 2023-11-27 04:12:11,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2023-11-27 04:12:19,042 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558250 2023-11-27 04:12:22,227 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5150, loss[loss=0.06885, simple_loss=0.09704, pruned_loss=0.01305, audio_tagging_loss=0.007283, over 16154.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08836, pruned_loss=0.01181, audio_tagging_loss=0.008472, over 3035938.12 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:12:39,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3721740.0, ans=0.125 2023-11-27 04:12:39,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3721740.0, ans=0.09899494936611666 2023-11-27 04:12:47,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3721806.6666666665, ans=0.125 2023-11-27 04:12:53,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3721806.6666666665, ans=0.0 2023-11-27 04:12:55,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3721873.3333333335, ans=0.125 2023-11-27 04:12:55,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.98 vs. limit=15.0 2023-11-27 04:13:14,243 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558300 2023-11-27 04:13:17,294 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5200, loss[loss=0.05999, simple_loss=0.07355, pruned_loss=0.01505, audio_tagging_loss=0.008171, over 14661.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08846, pruned_loss=0.01181, audio_tagging_loss=0.008394, over 3043287.39 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:13:25,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3722006.6666666665, ans=0.0 2023-11-27 04:13:31,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.871e+01 9.548e+01 1.019e+02 1.156e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 04:13:40,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2023-11-27 04:13:57,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3722206.6666666665, ans=0.2 2023-11-27 04:14:09,371 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558350 2023-11-27 04:14:12,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3722340.0, ans=0.2 2023-11-27 04:14:13,081 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5250, loss[loss=0.04463, simple_loss=0.06282, pruned_loss=0.005318, audio_tagging_loss=0.007906, over 15587.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.0885, pruned_loss=0.01176, audio_tagging_loss=0.008442, over 3048469.09 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:14:35,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3722473.3333333335, ans=0.0 2023-11-27 04:14:54,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3722540.0, ans=0.0 2023-11-27 04:14:55,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3722540.0, ans=0.125 2023-11-27 04:15:02,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2023-11-27 04:15:03,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3722606.6666666665, ans=0.1 2023-11-27 04:15:06,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558400 2023-11-27 04:15:09,441 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5300, loss[loss=0.06046, simple_loss=0.08412, pruned_loss=0.01009, audio_tagging_loss=0.00831, over 15475.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08923, pruned_loss=0.01192, audio_tagging_loss=0.008383, over 3045759.80 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:15:11,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3722673.3333333335, ans=0.0 2023-11-27 04:15:17,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3722673.3333333335, ans=0.125 2023-11-27 04:15:24,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.122e+01 9.674e+01 1.051e+02 1.467e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 04:15:27,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3722740.0, ans=0.0 2023-11-27 04:15:31,596 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:15:39,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3722806.6666666665, ans=0.1 2023-11-27 04:15:41,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3722873.3333333335, ans=0.2 2023-11-27 04:16:00,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3722940.0, ans=0.0 2023-11-27 04:16:02,265 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558450 2023-11-27 04:16:05,421 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5350, loss[loss=0.07924, simple_loss=0.1169, pruned_loss=0.01499, audio_tagging_loss=0.005789, over 15342.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.09023, pruned_loss=0.01194, audio_tagging_loss=0.008347, over 3047074.57 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:16:07,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3723006.6666666665, ans=0.5 2023-11-27 04:16:08,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3723006.6666666665, ans=0.0 2023-11-27 04:16:24,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3723073.3333333335, ans=0.0 2023-11-27 04:16:57,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558500 2023-11-27 04:17:00,474 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5400, loss[loss=0.05904, simple_loss=0.0812, pruned_loss=0.009568, audio_tagging_loss=0.008874, over 14858.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09096, pruned_loss=0.01204, audio_tagging_loss=0.008452, over 3050140.11 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:17:09,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3723340.0, ans=0.125 2023-11-27 04:17:16,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 9.000e+01 9.552e+01 1.029e+02 2.043e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-27 04:17:18,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3723406.6666666665, ans=0.125 2023-11-27 04:17:44,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3723606.6666666665, ans=0.125 2023-11-27 04:17:52,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3723606.6666666665, ans=0.125 2023-11-27 04:17:53,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558550 2023-11-27 04:17:57,345 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5450, loss[loss=0.04845, simple_loss=0.05779, pruned_loss=0.00947, audio_tagging_loss=0.01008, over 14401.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09117, pruned_loss=0.01232, audio_tagging_loss=0.008484, over 3046572.15 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:18:12,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3723740.0, ans=0.5 2023-11-27 04:18:21,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-11-27 04:18:37,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=22.5 2023-11-27 04:18:48,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3723940.0, ans=0.0 2023-11-27 04:18:49,227 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558600 2023-11-27 04:18:52,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2023-11-27 04:18:52,633 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5500, loss[loss=0.05382, simple_loss=0.06329, pruned_loss=0.009689, audio_tagging_loss=0.01248, over 15175.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09194, pruned_loss=0.01247, audio_tagging_loss=0.00847, over 3052983.07 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:18:54,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-27 04:18:55,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-27 04:18:55,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-27 04:18:57,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3724006.6666666665, ans=0.0 2023-11-27 04:18:57,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-27 04:18:59,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-27 04:19:07,815 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.127e+01 9.786e+01 1.057e+02 1.357e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-27 04:19:13,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3724140.0, ans=0.0 2023-11-27 04:19:20,023 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-11-27 04:19:45,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558650 2023-11-27 04:19:48,351 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5550, loss[loss=0.04509, simple_loss=0.05626, pruned_loss=0.006211, audio_tagging_loss=0.01075, over 17018.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09134, pruned_loss=0.01227, audio_tagging_loss=0.008644, over 3058585.67 frames. ], batch size: 66, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:19:49,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=15.0 2023-11-27 04:19:55,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3724340.0, ans=0.5 2023-11-27 04:20:07,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3724406.6666666665, ans=0.125 2023-11-27 04:20:08,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3724406.6666666665, ans=0.125 2023-11-27 04:20:10,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3724473.3333333335, ans=0.125 2023-11-27 04:20:26,252 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.66 vs. limit=15.0 2023-11-27 04:20:26,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3724540.0, ans=0.125 2023-11-27 04:20:35,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3724606.6666666665, ans=0.1 2023-11-27 04:20:41,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558700 2023-11-27 04:20:44,609 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5600, loss[loss=0.08468, simple_loss=0.1216, pruned_loss=0.01501, audio_tagging_loss=0.008881, over 15889.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09156, pruned_loss=0.01226, audio_tagging_loss=0.008659, over 3053874.50 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:20:52,895 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:20:53,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2023-11-27 04:21:00,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 9.029e+01 9.775e+01 1.051e+02 1.247e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-27 04:21:00,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3724740.0, ans=0.0 2023-11-27 04:21:14,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3724806.6666666665, ans=0.0 2023-11-27 04:21:20,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3724873.3333333335, ans=0.0 2023-11-27 04:21:23,804 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:21:35,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3724940.0, ans=0.125 2023-11-27 04:21:37,147 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558750 2023-11-27 04:21:37,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3724940.0, ans=0.0 2023-11-27 04:21:38,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3724940.0, ans=0.07 2023-11-27 04:21:39,713 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2023-11-27 04:21:40,273 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5650, loss[loss=0.05929, simple_loss=0.06744, pruned_loss=0.01542, audio_tagging_loss=0.01015, over 14965.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09123, pruned_loss=0.01222, audio_tagging_loss=0.008644, over 3056963.10 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:21:53,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3725073.3333333335, ans=0.125 2023-11-27 04:22:12,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3725140.0, ans=0.125 2023-11-27 04:22:33,030 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558800 2023-11-27 04:22:35,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3725340.0, ans=0.125 2023-11-27 04:22:36,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2023-11-27 04:22:36,431 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5700, loss[loss=0.07197, simple_loss=0.1004, pruned_loss=0.01291, audio_tagging_loss=0.008888, over 14591.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09012, pruned_loss=0.01206, audio_tagging_loss=0.008753, over 3061735.30 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:22:53,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 9.100e+01 9.627e+01 1.012e+02 1.597e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-27 04:23:23,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3725606.6666666665, ans=0.125 2023-11-27 04:23:28,782 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558850 2023-11-27 04:23:28,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3725606.6666666665, ans=0.125 2023-11-27 04:23:32,479 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5750, loss[loss=0.06624, simple_loss=0.09113, pruned_loss=0.01227, audio_tagging_loss=0.008414, over 15201.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08944, pruned_loss=0.01192, audio_tagging_loss=0.008701, over 3058638.44 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:23:34,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3725673.3333333335, ans=0.2 2023-11-27 04:23:36,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3725673.3333333335, ans=0.0 2023-11-27 04:23:44,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3725740.0, ans=0.1 2023-11-27 04:23:50,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3725740.0, ans=0.125 2023-11-27 04:23:54,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3725806.6666666665, ans=0.125 2023-11-27 04:23:54,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3725806.6666666665, ans=0.0 2023-11-27 04:24:01,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3725806.6666666665, ans=0.125 2023-11-27 04:24:06,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-27 04:24:11,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-27 04:24:12,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-27 04:24:25,226 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558900 2023-11-27 04:24:28,362 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5800, loss[loss=0.05773, simple_loss=0.07214, pruned_loss=0.01064, audio_tagging_loss=0.01103, over 15246.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08878, pruned_loss=0.0119, audio_tagging_loss=0.008687, over 3050808.88 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:24:29,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3726006.6666666665, ans=0.0 2023-11-27 04:24:44,055 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.898e+01 9.503e+01 1.014e+02 1.698e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-27 04:24:52,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-11-27 04:24:58,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=22.5 2023-11-27 04:25:08,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3726206.6666666665, ans=0.125 2023-11-27 04:25:10,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3726206.6666666665, ans=0.1 2023-11-27 04:25:11,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3726273.3333333335, ans=0.0 2023-11-27 04:25:20,295 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558950 2023-11-27 04:25:23,513 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5850, loss[loss=0.06263, simple_loss=0.08954, pruned_loss=0.01125, audio_tagging_loss=0.006613, over 15249.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08948, pruned_loss=0.01194, audio_tagging_loss=0.008554, over 3050496.65 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:25:31,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3726340.0, ans=0.125 2023-11-27 04:25:43,434 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:25:48,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-27 04:26:09,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3726606.6666666665, ans=0.125 2023-11-27 04:26:10,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3726606.6666666665, ans=0.0 2023-11-27 04:26:16,690 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559000 2023-11-27 04:26:20,663 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5900, loss[loss=0.06487, simple_loss=0.09434, pruned_loss=0.01194, audio_tagging_loss=0.005758, over 15712.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0891, pruned_loss=0.01179, audio_tagging_loss=0.008474, over 3047891.84 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:26:22,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3726673.3333333335, ans=0.2 2023-11-27 04:26:26,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3726673.3333333335, ans=0.125 2023-11-27 04:26:33,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3726740.0, ans=0.125 2023-11-27 04:26:37,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 9.105e+01 9.736e+01 1.055e+02 1.471e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 04:26:39,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3726740.0, ans=0.025 2023-11-27 04:26:43,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3726806.6666666665, ans=0.125 2023-11-27 04:26:54,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2023-11-27 04:26:56,487 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-27 04:27:11,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3726940.0, ans=0.125 2023-11-27 04:27:13,053 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559050 2023-11-27 04:27:16,210 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5950, loss[loss=0.06307, simple_loss=0.08969, pruned_loss=0.009638, audio_tagging_loss=0.008582, over 15524.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08935, pruned_loss=0.01186, audio_tagging_loss=0.008462, over 3051829.62 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:27:23,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3727006.6666666665, ans=10.0 2023-11-27 04:28:07,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559100 2023-11-27 04:28:10,858 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6000, loss[loss=0.09609, simple_loss=0.136, pruned_loss=0.02316, audio_tagging_loss=0.004952, over 15178.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08964, pruned_loss=0.01191, audio_tagging_loss=0.008419, over 3050318.66 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:28:10,858 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 04:28:25,734 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0664, 2.6733, 1.7226, 2.6694, 3.2283, 3.2854, 3.2188, 3.5088], device='cuda:2') 2023-11-27 04:28:43,427 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05733, simple_loss=0.05048, pruned_loss=0.005338, audio_tagging_loss=0.02675, over 4681554.00 frames. 2023-11-27 04:28:43,427 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 04:28:59,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 9.039e+01 9.599e+01 1.058e+02 1.819e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 04:29:01,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3727406.6666666665, ans=0.0 2023-11-27 04:29:21,499 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:29:35,855 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559150 2023-11-27 04:29:35,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3727606.6666666665, ans=0.125 2023-11-27 04:29:39,054 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6050, loss[loss=0.07378, simple_loss=0.1078, pruned_loss=0.01291, audio_tagging_loss=0.006967, over 16192.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08952, pruned_loss=0.01187, audio_tagging_loss=0.008493, over 3056025.26 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:29:51,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3727740.0, ans=0.95 2023-11-27 04:30:05,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3727806.6666666665, ans=0.1 2023-11-27 04:30:31,208 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559200 2023-11-27 04:30:34,600 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6100, loss[loss=0.05079, simple_loss=0.06421, pruned_loss=0.007343, audio_tagging_loss=0.01134, over 15545.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08955, pruned_loss=0.01206, audio_tagging_loss=0.008492, over 3051999.94 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:30:53,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.839e+01 9.378e+01 9.986e+01 1.390e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 04:30:59,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3728140.0, ans=0.125 2023-11-27 04:31:07,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3728206.6666666665, ans=0.0 2023-11-27 04:31:13,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3728206.6666666665, ans=0.0 2023-11-27 04:31:17,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3728273.3333333335, ans=0.125 2023-11-27 04:31:26,136 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559250 2023-11-27 04:31:30,229 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6150, loss[loss=0.05447, simple_loss=0.07108, pruned_loss=0.007071, audio_tagging_loss=0.01186, over 13862.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08933, pruned_loss=0.01214, audio_tagging_loss=0.008463, over 3047745.11 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:31:55,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.39 vs. limit=10.0 2023-11-27 04:32:01,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3728473.3333333335, ans=0.0 2023-11-27 04:32:16,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3728606.6666666665, ans=0.2 2023-11-27 04:32:22,918 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559300 2023-11-27 04:32:26,076 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6200, loss[loss=0.04723, simple_loss=0.06221, pruned_loss=0.005681, audio_tagging_loss=0.01044, over 14772.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08859, pruned_loss=0.01199, audio_tagging_loss=0.008585, over 3041006.44 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:32:27,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.87 vs. limit=10.0 2023-11-27 04:32:42,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.961e+01 9.532e+01 1.029e+02 1.294e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 04:32:55,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3728806.6666666665, ans=0.1 2023-11-27 04:33:12,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3728940.0, ans=0.2 2023-11-27 04:33:17,915 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559350 2023-11-27 04:33:21,001 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6250, loss[loss=0.06133, simple_loss=0.08007, pruned_loss=0.01005, audio_tagging_loss=0.01124, over 16381.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08803, pruned_loss=0.01193, audio_tagging_loss=0.00872, over 3041322.19 frames. ], batch size: 66, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:33:22,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3729006.6666666665, ans=0.125 2023-11-27 04:33:34,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3729073.3333333335, ans=0.0 2023-11-27 04:33:57,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2023-11-27 04:34:04,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3729273.3333333335, ans=0.95 2023-11-27 04:34:07,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3729273.3333333335, ans=0.0 2023-11-27 04:34:13,100 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559400 2023-11-27 04:34:16,463 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6300, loss[loss=0.05151, simple_loss=0.06539, pruned_loss=0.008165, audio_tagging_loss=0.01064, over 14364.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08796, pruned_loss=0.01191, audio_tagging_loss=0.008741, over 3039788.18 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:34:21,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3729340.0, ans=0.0 2023-11-27 04:34:24,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3729340.0, ans=0.0 2023-11-27 04:34:34,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3729406.6666666665, ans=0.0 2023-11-27 04:34:35,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.982e+01 9.657e+01 1.038e+02 1.298e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 04:34:40,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2023-11-27 04:34:54,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.58 vs. limit=10.0 2023-11-27 04:35:10,079 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559450 2023-11-27 04:35:13,208 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6350, loss[loss=0.06942, simple_loss=0.102, pruned_loss=0.01075, audio_tagging_loss=0.007672, over 15369.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.0876, pruned_loss=0.0118, audio_tagging_loss=0.008767, over 3038720.01 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:35:19,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3729673.3333333335, ans=0.0 2023-11-27 04:35:30,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3729740.0, ans=0.125 2023-11-27 04:35:38,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3729806.6666666665, ans=0.125 2023-11-27 04:35:43,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2023-11-27 04:36:02,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-27 04:36:05,625 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559500 2023-11-27 04:36:06,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-27 04:36:08,711 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6400, loss[loss=0.063, simple_loss=0.0916, pruned_loss=0.00961, audio_tagging_loss=0.007586, over 14662.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08782, pruned_loss=0.01179, audio_tagging_loss=0.008794, over 3038466.88 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:36:09,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3730006.6666666665, ans=0.125 2023-11-27 04:36:11,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3730006.6666666665, ans=0.125 2023-11-27 04:36:12,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3730006.6666666665, ans=0.1 2023-11-27 04:36:13,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3730006.6666666665, ans=0.2 2023-11-27 04:36:26,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.070e+01 9.519e+01 1.025e+02 1.551e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:36:33,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3730140.0, ans=0.125 2023-11-27 04:36:34,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3730140.0, ans=0.125 2023-11-27 04:36:44,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3730206.6666666665, ans=0.125 2023-11-27 04:36:48,201 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:37:00,626 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559550 2023-11-27 04:37:03,213 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2023-11-27 04:37:03,651 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6450, loss[loss=0.07127, simple_loss=0.09512, pruned_loss=0.01506, audio_tagging_loss=0.008654, over 14427.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08824, pruned_loss=0.01189, audio_tagging_loss=0.008897, over 3036250.15 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:37:43,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3730540.0, ans=0.04949747468305833 2023-11-27 04:37:51,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3730606.6666666665, ans=0.125 2023-11-27 04:37:55,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.63 vs. limit=10.0 2023-11-27 04:37:56,978 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559600 2023-11-27 04:37:58,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3730606.6666666665, ans=0.125 2023-11-27 04:38:00,424 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6500, loss[loss=0.06094, simple_loss=0.08299, pruned_loss=0.01191, audio_tagging_loss=0.007534, over 15046.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08774, pruned_loss=0.0118, audio_tagging_loss=0.008981, over 3044072.09 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:38:00,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3730673.3333333335, ans=0.125 2023-11-27 04:38:01,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3730673.3333333335, ans=0.125 2023-11-27 04:38:03,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3730673.3333333335, ans=0.125 2023-11-27 04:38:17,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3730740.0, ans=0.0 2023-11-27 04:38:18,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.947e+01 9.565e+01 1.041e+02 1.320e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 04:38:24,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3730806.6666666665, ans=0.05 2023-11-27 04:38:29,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3730806.6666666665, ans=0.0 2023-11-27 04:38:34,190 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:38:50,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=15.0 2023-11-27 04:38:53,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559650 2023-11-27 04:38:56,214 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6550, loss[loss=0.05754, simple_loss=0.08259, pruned_loss=0.00818, audio_tagging_loss=0.008066, over 15586.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08838, pruned_loss=0.0119, audio_tagging_loss=0.008732, over 3040445.59 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:38:57,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3731006.6666666665, ans=0.125 2023-11-27 04:39:28,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3731206.6666666665, ans=0.04949747468305833 2023-11-27 04:39:32,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3731206.6666666665, ans=0.125 2023-11-27 04:39:42,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3731273.3333333335, ans=0.2 2023-11-27 04:39:47,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.65 vs. limit=6.0 2023-11-27 04:39:47,929 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559700 2023-11-27 04:39:51,015 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6600, loss[loss=0.0895, simple_loss=0.1295, pruned_loss=0.01948, audio_tagging_loss=0.005249, over 16238.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08909, pruned_loss=0.01187, audio_tagging_loss=0.008577, over 3043111.20 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:39:57,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3731340.0, ans=0.125 2023-11-27 04:40:03,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.12 vs. limit=10.0 2023-11-27 04:40:09,654 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 8.917e+01 9.565e+01 1.016e+02 1.189e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 04:40:10,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3731406.6666666665, ans=0.125 2023-11-27 04:40:21,252 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-11-27 04:40:22,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3731473.3333333335, ans=0.125 2023-11-27 04:40:29,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3731540.0, ans=0.07 2023-11-27 04:40:34,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-27 04:40:34,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3731606.6666666665, ans=0.0 2023-11-27 04:40:44,498 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559750 2023-11-27 04:40:47,625 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6650, loss[loss=0.07015, simple_loss=0.09799, pruned_loss=0.01381, audio_tagging_loss=0.007338, over 14951.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08879, pruned_loss=0.01179, audio_tagging_loss=0.008485, over 3040774.33 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:40:49,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=12.0 2023-11-27 04:40:58,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3731740.0, ans=0.0 2023-11-27 04:41:12,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3731806.6666666665, ans=0.2 2023-11-27 04:41:21,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3731873.3333333335, ans=0.125 2023-11-27 04:41:29,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3731873.3333333335, ans=0.0 2023-11-27 04:41:39,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559800 2023-11-27 04:41:42,948 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6700, loss[loss=0.05408, simple_loss=0.07336, pruned_loss=0.0072, audio_tagging_loss=0.0102, over 15672.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08941, pruned_loss=0.01182, audio_tagging_loss=0.008457, over 3040593.92 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:41:51,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3732006.6666666665, ans=0.125 2023-11-27 04:42:03,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.829e+01 9.518e+01 1.003e+02 1.219e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:42:29,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3732273.3333333335, ans=0.5 2023-11-27 04:42:35,428 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559850 2023-11-27 04:42:35,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-27 04:42:38,559 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6750, loss[loss=0.06029, simple_loss=0.08678, pruned_loss=0.005979, audio_tagging_loss=0.01092, over 15941.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08907, pruned_loss=0.01186, audio_tagging_loss=0.008585, over 3040477.22 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:43:19,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3732540.0, ans=0.125 2023-11-27 04:43:21,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2023-11-27 04:43:24,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2023-11-27 04:43:31,370 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559900 2023-11-27 04:43:35,034 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6800, loss[loss=0.06872, simple_loss=0.09319, pruned_loss=0.01303, audio_tagging_loss=0.009092, over 14343.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08955, pruned_loss=0.01195, audio_tagging_loss=0.00856, over 3036295.70 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:43:36,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3732673.3333333335, ans=0.125 2023-11-27 04:43:38,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3732673.3333333335, ans=0.0 2023-11-27 04:43:39,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3732673.3333333335, ans=0.125 2023-11-27 04:43:54,088 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.067e+01 9.528e+01 1.036e+02 1.458e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 04:43:55,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3732806.6666666665, ans=0.0 2023-11-27 04:44:05,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3732806.6666666665, ans=0.95 2023-11-27 04:44:23,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.38 vs. limit=15.0 2023-11-27 04:44:25,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3732940.0, ans=0.125 2023-11-27 04:44:26,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559950 2023-11-27 04:44:29,964 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6850, loss[loss=0.04535, simple_loss=0.06002, pruned_loss=0.005489, audio_tagging_loss=0.009845, over 15844.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08954, pruned_loss=0.01191, audio_tagging_loss=0.008497, over 3035870.68 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:44:31,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733006.6666666665, ans=0.1 2023-11-27 04:44:50,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3733140.0, ans=0.0 2023-11-27 04:44:58,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3733140.0, ans=0.0 2023-11-27 04:45:00,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733140.0, ans=0.1 2023-11-27 04:45:14,191 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=22.5 2023-11-27 04:45:21,871 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560000 2023-11-27 04:45:27,239 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6900, loss[loss=0.06782, simple_loss=0.0976, pruned_loss=0.01105, audio_tagging_loss=0.007968, over 15428.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08924, pruned_loss=0.01175, audio_tagging_loss=0.008437, over 3037503.61 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:45:27,560 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:45:29,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3733340.0, ans=0.1 2023-11-27 04:45:35,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3733340.0, ans=0.0 2023-11-27 04:45:48,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.976e+01 9.495e+01 1.031e+02 1.745e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 04:46:09,627 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:46:10,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3733606.6666666665, ans=0.125 2023-11-27 04:46:13,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3733606.6666666665, ans=0.2 2023-11-27 04:46:16,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3733606.6666666665, ans=0.0 2023-11-27 04:46:20,311 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560050 2023-11-27 04:46:23,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733673.3333333335, ans=0.1 2023-11-27 04:46:23,954 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6950, loss[loss=0.06188, simple_loss=0.08323, pruned_loss=0.01022, audio_tagging_loss=0.01004, over 14493.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08871, pruned_loss=0.01169, audio_tagging_loss=0.008475, over 3030939.16 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:46:29,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3733673.3333333335, ans=10.0 2023-11-27 04:46:34,675 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2023-11-27 04:46:38,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3733740.0, ans=0.125 2023-11-27 04:47:16,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560100 2023-11-27 04:47:19,468 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7000, loss[loss=0.0653, simple_loss=0.08583, pruned_loss=0.01308, audio_tagging_loss=0.009312, over 16199.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08804, pruned_loss=0.01164, audio_tagging_loss=0.008538, over 3032603.36 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:47:22,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3734006.6666666665, ans=0.1 2023-11-27 04:47:25,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3734006.6666666665, ans=0.125 2023-11-27 04:47:37,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2023-11-27 04:47:39,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.711e+01 9.428e+01 1.006e+02 1.288e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 04:47:50,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3734140.0, ans=0.125 2023-11-27 04:47:53,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3734206.6666666665, ans=0.125 2023-11-27 04:47:57,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3734206.6666666665, ans=0.2 2023-11-27 04:48:05,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2023-11-27 04:48:11,184 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560150 2023-11-27 04:48:14,240 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7050, loss[loss=0.07056, simple_loss=0.09152, pruned_loss=0.01572, audio_tagging_loss=0.009072, over 15505.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08879, pruned_loss=0.01183, audio_tagging_loss=0.008584, over 3033469.63 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 4.0 2023-11-27 04:48:14,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3734340.0, ans=0.125 2023-11-27 04:48:30,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3734406.6666666665, ans=0.0 2023-11-27 04:48:30,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3734406.6666666665, ans=0.2 2023-11-27 04:48:38,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3734473.3333333335, ans=0.1 2023-11-27 04:48:52,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3734540.0, ans=0.09899494936611666 2023-11-27 04:48:56,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3734540.0, ans=0.2 2023-11-27 04:48:57,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3734606.6666666665, ans=0.1 2023-11-27 04:49:06,010 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560200 2023-11-27 04:49:10,409 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7100, loss[loss=0.05393, simple_loss=0.07039, pruned_loss=0.008499, audio_tagging_loss=0.01023, over 15514.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08827, pruned_loss=0.01174, audio_tagging_loss=0.008685, over 3035854.27 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:49:12,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-27 04:49:22,205 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-11-27 04:49:32,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.003e+01 9.804e+01 1.066e+02 3.214e+02, threshold=1.961e+02, percent-clipped=1.0 2023-11-27 04:49:59,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3734940.0, ans=0.2 2023-11-27 04:50:02,377 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560250 2023-11-27 04:50:05,508 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7150, loss[loss=0.06029, simple_loss=0.07565, pruned_loss=0.01497, audio_tagging_loss=0.007492, over 15348.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08808, pruned_loss=0.01178, audio_tagging_loss=0.008731, over 3038514.65 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:50:15,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3735073.3333333335, ans=0.125 2023-11-27 04:50:20,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3735073.3333333335, ans=0.125 2023-11-27 04:50:57,375 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560300 2023-11-27 04:51:00,451 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7200, loss[loss=0.05038, simple_loss=0.06218, pruned_loss=0.00899, audio_tagging_loss=0.0103, over 14693.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08849, pruned_loss=0.01185, audio_tagging_loss=0.008741, over 3044215.68 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:51:13,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3735406.6666666665, ans=0.0 2023-11-27 04:51:16,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3735406.6666666665, ans=0.125 2023-11-27 04:51:23,661 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.928e+01 9.518e+01 1.020e+02 1.389e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:51:37,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2023-11-27 04:51:49,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3735606.6666666665, ans=0.0 2023-11-27 04:51:52,251 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560350 2023-11-27 04:51:52,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3735606.6666666665, ans=0.0 2023-11-27 04:51:55,940 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7250, loss[loss=0.07722, simple_loss=0.09629, pruned_loss=0.01661, audio_tagging_loss=0.01246, over 15390.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08903, pruned_loss=0.01199, audio_tagging_loss=0.008761, over 3035413.61 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:52:19,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3735806.6666666665, ans=0.0 2023-11-27 04:52:33,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3735873.3333333335, ans=0.125 2023-11-27 04:52:33,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3735873.3333333335, ans=0.125 2023-11-27 04:52:41,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3735940.0, ans=0.125 2023-11-27 04:52:48,923 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560400 2023-11-27 04:52:50,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3735940.0, ans=0.2 2023-11-27 04:52:52,306 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7300, loss[loss=0.07524, simple_loss=0.09907, pruned_loss=0.01734, audio_tagging_loss=0.008362, over 15132.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08818, pruned_loss=0.0119, audio_tagging_loss=0.008827, over 3034289.39 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:53:01,369 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=22.5 2023-11-27 04:53:13,372 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.819e+01 9.611e+01 1.034e+02 1.337e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 04:53:13,973 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=12.0 2023-11-27 04:53:17,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3736140.0, ans=0.1 2023-11-27 04:53:17,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3736140.0, ans=0.125 2023-11-27 04:53:23,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3736140.0, ans=0.125 2023-11-27 04:53:30,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3736206.6666666665, ans=0.07 2023-11-27 04:53:32,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3736206.6666666665, ans=10.0 2023-11-27 04:53:44,149 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560450 2023-11-27 04:53:47,241 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7350, loss[loss=0.07842, simple_loss=0.1054, pruned_loss=0.01754, audio_tagging_loss=0.00819, over 16004.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08865, pruned_loss=0.0119, audio_tagging_loss=0.008623, over 3032391.07 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:53:54,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3736340.0, ans=0.125 2023-11-27 04:54:07,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3736406.6666666665, ans=0.125 2023-11-27 04:54:28,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3736540.0, ans=0.125 2023-11-27 04:54:34,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3736606.6666666665, ans=0.125 2023-11-27 04:54:34,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3736606.6666666665, ans=0.125 2023-11-27 04:54:38,969 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560500 2023-11-27 04:54:42,036 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7400, loss[loss=0.06503, simple_loss=0.08632, pruned_loss=0.01055, audio_tagging_loss=0.01132, over 14946.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08852, pruned_loss=0.01181, audio_tagging_loss=0.008528, over 3030848.72 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:55:02,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3736740.0, ans=0.025 2023-11-27 04:55:05,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 8.988e+01 9.437e+01 1.029e+02 2.461e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-27 04:55:14,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=15.0 2023-11-27 04:55:35,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560550 2023-11-27 04:55:39,278 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7450, loss[loss=0.07958, simple_loss=0.1088, pruned_loss=0.01592, audio_tagging_loss=0.009237, over 15252.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08827, pruned_loss=0.0119, audio_tagging_loss=0.008554, over 3035678.87 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:55:50,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3737073.3333333335, ans=0.0 2023-11-27 04:55:54,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3737073.3333333335, ans=0.025 2023-11-27 04:56:20,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3737206.6666666665, ans=0.0 2023-11-27 04:56:26,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3737273.3333333335, ans=0.125 2023-11-27 04:56:31,248 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560600 2023-11-27 04:56:34,672 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7500, loss[loss=0.07691, simple_loss=0.1121, pruned_loss=0.01339, audio_tagging_loss=0.007487, over 15195.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08934, pruned_loss=0.01204, audio_tagging_loss=0.008459, over 3042872.94 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:56:47,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2023-11-27 04:56:56,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.930e+01 9.677e+01 1.022e+02 1.348e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 04:57:24,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3737606.6666666665, ans=0.1 2023-11-27 04:57:26,563 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560650 2023-11-27 04:57:29,690 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7550, loss[loss=0.04677, simple_loss=0.06443, pruned_loss=0.007565, audio_tagging_loss=0.006992, over 15261.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08931, pruned_loss=0.01201, audio_tagging_loss=0.008433, over 3042774.33 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:57:41,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3737740.0, ans=0.05 2023-11-27 04:58:00,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3737806.6666666665, ans=0.125 2023-11-27 04:58:10,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3737873.3333333335, ans=0.0 2023-11-27 04:58:15,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3737940.0, ans=0.125 2023-11-27 04:58:22,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2023-11-27 04:58:23,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560700 2023-11-27 04:58:26,292 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7600, loss[loss=0.06379, simple_loss=0.08932, pruned_loss=0.012, audio_tagging_loss=0.007132, over 14511.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08788, pruned_loss=0.01181, audio_tagging_loss=0.008486, over 3048731.31 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:58:34,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3738006.6666666665, ans=0.0 2023-11-27 04:58:48,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.653e+01 9.351e+01 1.003e+02 1.286e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 04:58:55,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3738140.0, ans=0.0 2023-11-27 04:58:56,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3738140.0, ans=0.0 2023-11-27 04:59:02,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3738206.6666666665, ans=0.1 2023-11-27 04:59:08,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-11-27 04:59:14,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3738273.3333333335, ans=0.1 2023-11-27 04:59:16,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3738273.3333333335, ans=0.125 2023-11-27 04:59:18,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560750 2023-11-27 04:59:20,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3738273.3333333335, ans=0.0 2023-11-27 04:59:22,042 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7650, loss[loss=0.07145, simple_loss=0.1085, pruned_loss=0.00892, audio_tagging_loss=0.008251, over 15301.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08851, pruned_loss=0.01177, audio_tagging_loss=0.008494, over 3049675.36 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:59:22,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3738340.0, ans=0.125 2023-11-27 04:59:25,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3738340.0, ans=0.125 2023-11-27 05:00:05,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2023-11-27 05:00:12,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-27 05:00:13,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-27 05:00:13,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560800 2023-11-27 05:00:17,086 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7700, loss[loss=0.06579, simple_loss=0.09295, pruned_loss=0.01206, audio_tagging_loss=0.007262, over 15957.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08882, pruned_loss=0.01175, audio_tagging_loss=0.008555, over 3050515.31 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:00:25,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3738673.3333333335, ans=0.1 2023-11-27 05:00:30,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3738740.0, ans=0.125 2023-11-27 05:00:32,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3738740.0, ans=0.125 2023-11-27 05:00:35,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2023-11-27 05:00:40,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.124e+01 9.752e+01 1.036e+02 1.277e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 05:00:44,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3738806.6666666665, ans=0.1 2023-11-27 05:00:50,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3738873.3333333335, ans=0.2 2023-11-27 05:01:09,973 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560850 2023-11-27 05:01:13,596 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7750, loss[loss=0.04967, simple_loss=0.06887, pruned_loss=0.008541, audio_tagging_loss=0.006695, over 14718.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08872, pruned_loss=0.01186, audio_tagging_loss=0.008556, over 3047127.85 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:01:15,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3739006.6666666665, ans=0.125 2023-11-27 05:01:21,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3739006.6666666665, ans=0.125 2023-11-27 05:01:50,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3739206.6666666665, ans=0.125 2023-11-27 05:01:52,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3739206.6666666665, ans=0.0 2023-11-27 05:02:05,321 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560900 2023-11-27 05:02:08,474 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7800, loss[loss=0.07628, simple_loss=0.1057, pruned_loss=0.01648, audio_tagging_loss=0.006934, over 14528.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08905, pruned_loss=0.01201, audio_tagging_loss=0.008664, over 3042082.38 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:02:23,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3739406.6666666665, ans=0.125 2023-11-27 05:02:28,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-27 05:02:28,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-27 05:02:31,133 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.958e+01 9.629e+01 1.040e+02 1.238e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 05:02:38,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3739473.3333333335, ans=10.0 2023-11-27 05:02:50,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3739540.0, ans=0.1 2023-11-27 05:03:00,421 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560950 2023-11-27 05:03:03,542 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7850, loss[loss=0.07023, simple_loss=0.09205, pruned_loss=0.01577, audio_tagging_loss=0.00843, over 15782.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08878, pruned_loss=0.01193, audio_tagging_loss=0.008685, over 3043455.03 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:03:07,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3739673.3333333335, ans=0.125 2023-11-27 05:03:10,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3739673.3333333335, ans=0.125 2023-11-27 05:03:22,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3739740.0, ans=0.0 2023-11-27 05:03:32,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3739806.6666666665, ans=0.125 2023-11-27 05:03:34,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3739806.6666666665, ans=0.0 2023-11-27 05:03:41,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3739873.3333333335, ans=0.0 2023-11-27 05:03:56,051 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561000 2023-11-27 05:03:59,951 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7900, loss[loss=0.07801, simple_loss=0.1071, pruned_loss=0.01882, audio_tagging_loss=0.005619, over 14836.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08975, pruned_loss=0.01212, audio_tagging_loss=0.008691, over 3050505.67 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:04:03,177 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2023-11-27 05:04:12,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3740073.3333333335, ans=0.0 2023-11-27 05:04:15,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3740073.3333333335, ans=0.2 2023-11-27 05:04:23,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.150e+01 9.892e+01 1.047e+02 1.288e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-27 05:04:52,362 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561050 2023-11-27 05:04:55,417 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7950, loss[loss=0.05432, simple_loss=0.0686, pruned_loss=0.009484, audio_tagging_loss=0.01053, over 14537.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08825, pruned_loss=0.01187, audio_tagging_loss=0.008806, over 3047382.95 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:04:56,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3740340.0, ans=0.0 2023-11-27 05:05:08,732 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:05:19,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3740473.3333333335, ans=0.0 2023-11-27 05:05:22,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3740473.3333333335, ans=0.125 2023-11-27 05:05:35,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3740540.0, ans=0.0 2023-11-27 05:05:43,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3740606.6666666665, ans=0.1 2023-11-27 05:05:47,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561100 2023-11-27 05:05:51,024 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8000, loss[loss=0.05142, simple_loss=0.06929, pruned_loss=0.006678, audio_tagging_loss=0.0101, over 16498.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0883, pruned_loss=0.01185, audio_tagging_loss=0.008876, over 3039913.71 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:06:08,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3740740.0, ans=0.125 2023-11-27 05:06:14,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 8.945e+01 9.427e+01 1.017e+02 1.273e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 05:06:22,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3740806.6666666665, ans=0.125 2023-11-27 05:06:27,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3740873.3333333335, ans=0.125 2023-11-27 05:06:42,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561150 2023-11-27 05:06:46,431 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8050, loss[loss=0.06776, simple_loss=0.08978, pruned_loss=0.0122, audio_tagging_loss=0.01067, over 15450.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.0883, pruned_loss=0.01192, audio_tagging_loss=0.008923, over 3039636.58 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:06:50,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3741006.6666666665, ans=0.125 2023-11-27 05:07:14,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3741140.0, ans=15.0 2023-11-27 05:07:21,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741206.6666666665, ans=0.1 2023-11-27 05:07:23,613 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2023-11-27 05:07:38,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3741273.3333333335, ans=0.125 2023-11-27 05:07:39,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561200 2023-11-27 05:07:42,681 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8100, loss[loss=0.07258, simple_loss=0.1046, pruned_loss=0.0117, audio_tagging_loss=0.008575, over 15241.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08848, pruned_loss=0.01202, audio_tagging_loss=0.008816, over 3036399.59 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:07:49,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3741340.0, ans=0.0 2023-11-27 05:07:49,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3741340.0, ans=0.125 2023-11-27 05:08:05,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.986e+01 9.924e+01 1.066e+02 1.404e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-27 05:08:09,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3741473.3333333335, ans=0.125 2023-11-27 05:08:20,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.97 vs. limit=10.0 2023-11-27 05:08:28,029 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:08:34,235 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561250 2023-11-27 05:08:37,355 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8150, loss[loss=0.06671, simple_loss=0.08659, pruned_loss=0.01324, audio_tagging_loss=0.01018, over 13932.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08871, pruned_loss=0.01203, audio_tagging_loss=0.00867, over 3028876.04 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:09:09,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2023-11-27 05:09:11,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741873.3333333335, ans=0.1 2023-11-27 05:09:28,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3741940.0, ans=0.0 2023-11-27 05:09:29,735 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561300 2023-11-27 05:09:32,793 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8200, loss[loss=0.05144, simple_loss=0.07309, pruned_loss=0.0069, audio_tagging_loss=0.007995, over 15081.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08947, pruned_loss=0.01224, audio_tagging_loss=0.008607, over 3035194.62 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:09:32,838 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:09:57,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 9.029e+01 9.641e+01 1.058e+02 1.267e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 05:10:06,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3742206.6666666665, ans=0.125 2023-11-27 05:10:07,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3742206.6666666665, ans=0.0 2023-11-27 05:10:26,195 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561350 2023-11-27 05:10:27,713 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-11-27 05:10:29,351 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8250, loss[loss=0.05564, simple_loss=0.07935, pruned_loss=0.008162, audio_tagging_loss=0.007798, over 14815.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08916, pruned_loss=0.01206, audio_tagging_loss=0.008526, over 3034695.66 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:10:32,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3742340.0, ans=0.5 2023-11-27 05:10:35,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3742340.0, ans=0.1 2023-11-27 05:10:48,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2023-11-27 05:11:21,048 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561400 2023-11-27 05:11:24,418 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8300, loss[loss=0.0601, simple_loss=0.07362, pruned_loss=0.01479, audio_tagging_loss=0.008499, over 16430.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08932, pruned_loss=0.01197, audio_tagging_loss=0.008502, over 3035766.45 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:11:26,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3742673.3333333335, ans=0.1 2023-11-27 05:11:27,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.34 vs. limit=6.0 2023-11-27 05:11:47,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2023-11-27 05:11:49,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.979e+01 9.695e+01 1.038e+02 1.326e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 05:12:06,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3742873.3333333335, ans=0.1 2023-11-27 05:12:16,293 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561450 2023-11-27 05:12:19,462 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8350, loss[loss=0.05865, simple_loss=0.07767, pruned_loss=0.008824, audio_tagging_loss=0.01099, over 15269.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08885, pruned_loss=0.01194, audio_tagging_loss=0.008417, over 3038009.50 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:12:36,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3743073.3333333335, ans=0.125 2023-11-27 05:12:52,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3743206.6666666665, ans=0.5 2023-11-27 05:12:53,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3743206.6666666665, ans=0.125 2023-11-27 05:13:03,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3743273.3333333335, ans=0.125 2023-11-27 05:13:05,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3743273.3333333335, ans=0.09899494936611666 2023-11-27 05:13:13,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561500 2023-11-27 05:13:13,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3743273.3333333335, ans=0.125 2023-11-27 05:13:16,287 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8400, loss[loss=0.06411, simple_loss=0.08204, pruned_loss=0.01219, audio_tagging_loss=0.01089, over 15295.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08851, pruned_loss=0.01185, audio_tagging_loss=0.00845, over 3041898.61 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:13:30,341 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:13:39,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.778e+01 9.390e+01 9.920e+01 1.165e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 05:14:01,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3743606.6666666665, ans=0.125 2023-11-27 05:14:04,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3743606.6666666665, ans=0.1 2023-11-27 05:14:08,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561550 2023-11-27 05:14:11,427 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8450, loss[loss=0.06819, simple_loss=0.09281, pruned_loss=0.01361, audio_tagging_loss=0.008179, over 15369.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08834, pruned_loss=0.01161, audio_tagging_loss=0.008512, over 3041322.18 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:14:43,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3743806.6666666665, ans=0.1 2023-11-27 05:14:55,129 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2023-11-27 05:15:03,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561600 2023-11-27 05:15:05,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3744006.6666666665, ans=0.1 2023-11-27 05:15:06,444 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8500, loss[loss=0.05452, simple_loss=0.07552, pruned_loss=0.009788, audio_tagging_loss=0.006975, over 15420.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08786, pruned_loss=0.01157, audio_tagging_loss=0.008522, over 3046021.01 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:15:23,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3744073.3333333335, ans=0.0 2023-11-27 05:15:31,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.032e+01 9.244e+01 9.812e+01 1.039e+02 1.357e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 05:15:49,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3744273.3333333335, ans=0.125 2023-11-27 05:15:58,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3744273.3333333335, ans=0.025 2023-11-27 05:15:58,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561650 2023-11-27 05:16:03,176 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8550, loss[loss=0.07266, simple_loss=0.1025, pruned_loss=0.01503, audio_tagging_loss=0.006363, over 15705.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08868, pruned_loss=0.01175, audio_tagging_loss=0.008526, over 3052630.01 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:16:08,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2023-11-27 05:16:18,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3744406.6666666665, ans=0.02 2023-11-27 05:16:19,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3744406.6666666665, ans=0.125 2023-11-27 05:16:36,764 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.31 vs. limit=8.0 2023-11-27 05:16:49,803 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=22.5 2023-11-27 05:16:54,649 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561700 2023-11-27 05:16:57,854 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8600, loss[loss=0.05692, simple_loss=0.07677, pruned_loss=0.009833, audio_tagging_loss=0.008703, over 15224.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08843, pruned_loss=0.01177, audio_tagging_loss=0.008611, over 3046321.41 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:17:22,244 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.987e+01 9.604e+01 1.025e+02 1.300e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 05:17:50,179 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561750 2023-11-27 05:17:53,226 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8650, loss[loss=0.08555, simple_loss=0.1267, pruned_loss=0.01624, audio_tagging_loss=0.005978, over 15068.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08866, pruned_loss=0.01179, audio_tagging_loss=0.00871, over 3041992.71 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:18:20,877 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:18:29,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3745206.6666666665, ans=0.125 2023-11-27 05:18:45,597 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561800 2023-11-27 05:18:46,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3745273.3333333335, ans=0.2 2023-11-27 05:18:50,062 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8700, loss[loss=0.05966, simple_loss=0.08372, pruned_loss=0.009099, audio_tagging_loss=0.008705, over 16970.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08881, pruned_loss=0.01168, audio_tagging_loss=0.008693, over 3047814.47 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:18:53,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3745340.0, ans=0.0 2023-11-27 05:18:57,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3745340.0, ans=0.125 2023-11-27 05:19:06,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3745406.6666666665, ans=0.125 2023-11-27 05:19:14,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 9.080e+01 9.687e+01 1.041e+02 1.884e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 05:19:20,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3745473.3333333335, ans=0.2 2023-11-27 05:19:30,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3745540.0, ans=0.1 2023-11-27 05:19:30,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3745540.0, ans=0.125 2023-11-27 05:19:42,786 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561850 2023-11-27 05:19:45,865 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8750, loss[loss=0.06117, simple_loss=0.08582, pruned_loss=0.008463, audio_tagging_loss=0.009791, over 14335.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08926, pruned_loss=0.01178, audio_tagging_loss=0.008793, over 3039172.61 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:19:48,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3745673.3333333335, ans=0.0 2023-11-27 05:20:37,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561900 2023-11-27 05:20:40,938 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8800, loss[loss=0.0599, simple_loss=0.08454, pruned_loss=0.00977, audio_tagging_loss=0.007858, over 15160.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09012, pruned_loss=0.01203, audio_tagging_loss=0.008826, over 3043250.15 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:20:48,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-27 05:20:59,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3746073.3333333335, ans=0.125 2023-11-27 05:21:07,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.208e+01 9.853e+01 1.073e+02 1.310e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 05:21:17,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746206.6666666665, ans=0.1 2023-11-27 05:21:19,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746206.6666666665, ans=0.1 2023-11-27 05:21:29,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2023-11-27 05:21:32,940 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561950 2023-11-27 05:21:37,199 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8850, loss[loss=0.08667, simple_loss=0.1218, pruned_loss=0.01734, audio_tagging_loss=0.008441, over 15074.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0905, pruned_loss=0.01203, audio_tagging_loss=0.008828, over 3046688.72 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:21:46,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.43 vs. limit=10.0 2023-11-27 05:21:47,256 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:21:48,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-27 05:22:28,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3746606.6666666665, ans=0.0 2023-11-27 05:22:29,708 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562000 2023-11-27 05:22:33,127 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8900, loss[loss=0.07007, simple_loss=0.09144, pruned_loss=0.01377, audio_tagging_loss=0.01058, over 14925.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09086, pruned_loss=0.01192, audio_tagging_loss=0.008679, over 3047746.15 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:22:34,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3746673.3333333335, ans=0.0 2023-11-27 05:22:43,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746740.0, ans=0.1 2023-11-27 05:22:47,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3746740.0, ans=0.125 2023-11-27 05:22:49,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.26 vs. limit=10.0 2023-11-27 05:22:54,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-27 05:22:59,601 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 9.144e+01 9.662e+01 1.039e+02 1.595e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 05:23:06,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3746873.3333333335, ans=0.0 2023-11-27 05:23:14,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3746873.3333333335, ans=0.125 2023-11-27 05:23:16,275 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-27 05:23:17,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3746940.0, ans=0.125 2023-11-27 05:23:22,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3746940.0, ans=0.1 2023-11-27 05:23:25,554 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562050 2023-11-27 05:23:28,619 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8950, loss[loss=0.05658, simple_loss=0.07938, pruned_loss=0.007835, audio_tagging_loss=0.009055, over 15137.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09068, pruned_loss=0.01196, audio_tagging_loss=0.008587, over 3042010.16 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:23:35,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3747006.6666666665, ans=0.125 2023-11-27 05:23:38,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3747073.3333333335, ans=0.125 2023-11-27 05:24:05,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3747206.6666666665, ans=0.0 2023-11-27 05:24:12,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3747273.3333333335, ans=0.125 2023-11-27 05:24:20,695 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562100 2023-11-27 05:24:22,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3747273.3333333335, ans=0.0 2023-11-27 05:24:24,304 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9000, loss[loss=0.06326, simple_loss=0.09017, pruned_loss=0.01076, audio_tagging_loss=0.007416, over 15355.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09122, pruned_loss=0.01207, audio_tagging_loss=0.00852, over 3046913.69 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:24:24,305 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 05:24:50,071 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1436, 2.4498, 5.0206, 3.0325], device='cuda:2') 2023-11-27 05:24:56,578 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05848, simple_loss=0.05048, pruned_loss=0.005329, audio_tagging_loss=0.02791, over 4681554.00 frames. 2023-11-27 05:24:56,579 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 05:24:56,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3747340.0, ans=0.125 2023-11-27 05:25:01,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3747340.0, ans=0.0 2023-11-27 05:25:03,671 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:25:09,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3747406.6666666665, ans=0.0 2023-11-27 05:25:15,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3747406.6666666665, ans=0.125 2023-11-27 05:25:18,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3747473.3333333335, ans=0.125 2023-11-27 05:25:23,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.971e+01 9.071e+01 9.540e+01 1.018e+02 1.204e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 05:25:26,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3747473.3333333335, ans=0.1 2023-11-27 05:25:47,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3747606.6666666665, ans=0.0 2023-11-27 05:25:49,082 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562150 2023-11-27 05:25:52,133 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9050, loss[loss=0.07397, simple_loss=0.1098, pruned_loss=0.01107, audio_tagging_loss=0.007984, over 14952.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09055, pruned_loss=0.01197, audio_tagging_loss=0.008469, over 3043751.38 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:25:52,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3747673.3333333335, ans=0.125 2023-11-27 05:26:14,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3747806.6666666665, ans=0.2 2023-11-27 05:26:41,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2023-11-27 05:26:44,561 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562200 2023-11-27 05:26:48,186 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9100, loss[loss=0.05608, simple_loss=0.07358, pruned_loss=0.009812, audio_tagging_loss=0.009474, over 15108.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.09024, pruned_loss=0.01189, audio_tagging_loss=0.008455, over 3045753.82 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:26:57,786 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2023-11-27 05:27:09,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3748140.0, ans=0.0 2023-11-27 05:27:15,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 9.080e+01 9.612e+01 1.021e+02 1.425e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 05:27:23,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3748206.6666666665, ans=0.0 2023-11-27 05:27:29,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3748206.6666666665, ans=0.125 2023-11-27 05:27:30,934 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-27 05:27:40,444 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562250 2023-11-27 05:27:43,542 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9150, loss[loss=0.05807, simple_loss=0.08253, pruned_loss=0.0106, audio_tagging_loss=0.006205, over 15045.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08887, pruned_loss=0.01173, audio_tagging_loss=0.008525, over 3047639.40 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:28:19,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3748540.0, ans=0.125 2023-11-27 05:28:28,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3748606.6666666665, ans=0.0 2023-11-27 05:28:35,589 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562300 2023-11-27 05:28:39,269 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9200, loss[loss=0.06289, simple_loss=0.08387, pruned_loss=0.01405, audio_tagging_loss=0.006902, over 14996.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08835, pruned_loss=0.01163, audio_tagging_loss=0.008547, over 3050883.23 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:28:55,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3748740.0, ans=0.125 2023-11-27 05:29:07,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 9.097e+01 9.873e+01 1.057e+02 1.295e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-27 05:29:17,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3748873.3333333335, ans=0.2 2023-11-27 05:29:31,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562350 2023-11-27 05:29:35,198 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9250, loss[loss=0.06762, simple_loss=0.09742, pruned_loss=0.01172, audio_tagging_loss=0.007186, over 15568.00 frames. ], tot_loss[loss=0.06398, simple_loss=0.08787, pruned_loss=0.01156, audio_tagging_loss=0.008486, over 3043997.81 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:29:52,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3749073.3333333335, ans=0.1 2023-11-27 05:30:27,473 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562400 2023-11-27 05:30:30,780 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9300, loss[loss=0.06835, simple_loss=0.09311, pruned_loss=0.01624, audio_tagging_loss=0.005552, over 16442.00 frames. ], tot_loss[loss=0.06391, simple_loss=0.08764, pruned_loss=0.01155, audio_tagging_loss=0.008537, over 3044669.56 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:30:58,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 9.060e+01 9.666e+01 1.045e+02 1.304e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 05:31:01,682 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2023-11-27 05:31:22,691 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562450 2023-11-27 05:31:25,814 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9350, loss[loss=0.06594, simple_loss=0.09182, pruned_loss=0.01202, audio_tagging_loss=0.008009, over 14996.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08843, pruned_loss=0.01168, audio_tagging_loss=0.008554, over 3051647.15 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:31:26,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3749673.3333333335, ans=0.125 2023-11-27 05:32:02,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3749873.3333333335, ans=0.0 2023-11-27 05:32:11,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3749940.0, ans=0.0 2023-11-27 05:32:11,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-11-27 05:32:18,291 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562500 2023-11-27 05:32:21,969 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9400, loss[loss=0.06823, simple_loss=0.09289, pruned_loss=0.01418, audio_tagging_loss=0.007603, over 14393.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08898, pruned_loss=0.01187, audio_tagging_loss=0.008541, over 3052471.75 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:32:23,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=12.0 2023-11-27 05:32:34,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3750073.3333333335, ans=0.1 2023-11-27 05:32:37,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3750073.3333333335, ans=0.2 2023-11-27 05:32:50,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.326e+01 9.875e+01 1.073e+02 1.247e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-27 05:32:52,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.10 vs. limit=22.5 2023-11-27 05:33:01,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2023-11-27 05:33:07,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3750273.3333333335, ans=0.125 2023-11-27 05:33:14,803 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562550 2023-11-27 05:33:16,809 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:33:17,830 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9450, loss[loss=0.04367, simple_loss=0.05585, pruned_loss=0.005607, audio_tagging_loss=0.01014, over 15081.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09034, pruned_loss=0.01218, audio_tagging_loss=0.008538, over 3052917.61 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:33:31,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3750406.6666666665, ans=0.2 2023-11-27 05:33:50,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3750540.0, ans=0.125 2023-11-27 05:34:03,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3750606.6666666665, ans=0.125 2023-11-27 05:34:10,313 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562600 2023-11-27 05:34:11,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-27 05:34:13,605 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9500, loss[loss=0.06345, simple_loss=0.07935, pruned_loss=0.01182, audio_tagging_loss=0.01195, over 14859.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08964, pruned_loss=0.01208, audio_tagging_loss=0.008706, over 3050549.09 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:34:28,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3750740.0, ans=0.0 2023-11-27 05:34:40,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3750806.6666666665, ans=0.125 2023-11-27 05:34:40,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3750806.6666666665, ans=0.125 2023-11-27 05:34:42,468 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-27 05:34:42,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.626e+01 8.981e+01 9.517e+01 1.036e+02 1.547e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 05:34:43,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750806.6666666665, ans=0.1 2023-11-27 05:35:03,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-11-27 05:35:05,379 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562650 2023-11-27 05:35:08,580 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9550, loss[loss=0.06616, simple_loss=0.08903, pruned_loss=0.01199, audio_tagging_loss=0.009661, over 15013.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.0898, pruned_loss=0.0121, audio_tagging_loss=0.008708, over 3053644.71 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:35:23,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3751073.3333333335, ans=0.125 2023-11-27 05:35:44,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3751206.6666666665, ans=0.2 2023-11-27 05:35:49,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3751206.6666666665, ans=0.0 2023-11-27 05:35:49,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-27 05:35:50,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3751206.6666666665, ans=0.1 2023-11-27 05:35:53,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3751273.3333333335, ans=0.0 2023-11-27 05:36:02,677 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562700 2023-11-27 05:36:05,794 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9600, loss[loss=0.05894, simple_loss=0.08125, pruned_loss=0.01086, audio_tagging_loss=0.007458, over 15019.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09043, pruned_loss=0.0123, audio_tagging_loss=0.008689, over 3056753.08 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:36:28,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=12.0 2023-11-27 05:36:33,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 9.083e+01 9.744e+01 1.053e+02 1.282e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 05:36:43,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-11-27 05:36:47,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3751540.0, ans=0.05 2023-11-27 05:36:49,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3751606.6666666665, ans=0.1 2023-11-27 05:36:57,619 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562750 2023-11-27 05:37:00,758 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9650, loss[loss=0.05356, simple_loss=0.06664, pruned_loss=0.008994, audio_tagging_loss=0.01124, over 14570.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0901, pruned_loss=0.01223, audio_tagging_loss=0.008715, over 3053137.61 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:37:02,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3751673.3333333335, ans=0.0 2023-11-27 05:37:03,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3751673.3333333335, ans=0.0 2023-11-27 05:37:12,932 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-27 05:37:34,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3751873.3333333335, ans=0.09899494936611666 2023-11-27 05:37:45,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3751940.0, ans=0.125 2023-11-27 05:37:49,972 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-11-27 05:37:50,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3751940.0, ans=0.0 2023-11-27 05:37:51,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2023-11-27 05:37:52,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562800 2023-11-27 05:37:56,173 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9700, loss[loss=0.07468, simple_loss=0.09925, pruned_loss=0.01752, audio_tagging_loss=0.007542, over 15438.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.0891, pruned_loss=0.0121, audio_tagging_loss=0.008626, over 3042645.85 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:38:02,236 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:38:04,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3752006.6666666665, ans=0.09899494936611666 2023-11-27 05:38:06,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3752073.3333333335, ans=0.125 2023-11-27 05:38:25,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.968e+01 9.559e+01 1.024e+02 1.547e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 05:38:47,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3752273.3333333335, ans=0.1 2023-11-27 05:38:48,473 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562850 2023-11-27 05:38:52,141 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9750, loss[loss=0.04961, simple_loss=0.06609, pruned_loss=0.007375, audio_tagging_loss=0.009192, over 14692.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.0889, pruned_loss=0.0119, audio_tagging_loss=0.008598, over 3045375.73 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:39:43,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3752606.6666666665, ans=0.0 2023-11-27 05:39:44,014 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562900 2023-11-27 05:39:47,150 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9800, loss[loss=0.06692, simple_loss=0.09223, pruned_loss=0.01283, audio_tagging_loss=0.007974, over 14873.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08936, pruned_loss=0.01186, audio_tagging_loss=0.00849, over 3042027.42 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:40:04,752 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2023-11-27 05:40:05,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3752740.0, ans=0.2 2023-11-27 05:40:11,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3752806.6666666665, ans=0.125 2023-11-27 05:40:17,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.828e+01 9.556e+01 1.035e+02 1.179e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-27 05:40:19,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3752806.6666666665, ans=0.07 2023-11-27 05:40:22,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-27 05:40:24,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3752873.3333333335, ans=0.025 2023-11-27 05:40:36,831 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:40:39,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562950 2023-11-27 05:40:40,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3752940.0, ans=0.125 2023-11-27 05:40:42,227 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9850, loss[loss=0.07977, simple_loss=0.1119, pruned_loss=0.01563, audio_tagging_loss=0.008174, over 17561.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08999, pruned_loss=0.01204, audio_tagging_loss=0.008401, over 3041281.08 frames. ], batch size: 66, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:40:51,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3753006.6666666665, ans=0.0 2023-11-27 05:41:16,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3753206.6666666665, ans=0.125 2023-11-27 05:41:24,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3753206.6666666665, ans=0.2 2023-11-27 05:41:24,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3753206.6666666665, ans=0.125 2023-11-27 05:41:29,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2023-11-27 05:41:31,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3753273.3333333335, ans=0.0 2023-11-27 05:41:34,377 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563000 2023-11-27 05:41:37,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3753340.0, ans=0.0 2023-11-27 05:41:37,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3753340.0, ans=0.0 2023-11-27 05:41:38,593 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9900, loss[loss=0.0613, simple_loss=0.0793, pruned_loss=0.01283, audio_tagging_loss=0.008827, over 15922.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.0902, pruned_loss=0.01203, audio_tagging_loss=0.00841, over 3044793.30 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:42:06,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3753473.3333333335, ans=22.5 2023-11-27 05:42:07,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2023-11-27 05:42:07,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.022e+01 9.450e+01 1.050e+02 2.788e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-27 05:42:13,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2023-11-27 05:42:31,028 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563050 2023-11-27 05:42:34,132 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9950, loss[loss=0.06257, simple_loss=0.08502, pruned_loss=0.01261, audio_tagging_loss=0.007448, over 15282.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09072, pruned_loss=0.01217, audio_tagging_loss=0.008342, over 3047028.16 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:42:45,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-27 05:42:51,652 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=12.0 2023-11-27 05:43:08,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3753873.3333333335, ans=10.0 2023-11-27 05:43:08,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3753873.3333333335, ans=0.125 2023-11-27 05:43:14,535 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:43:24,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3753940.0, ans=0.0 2023-11-27 05:43:26,178 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563100 2023-11-27 05:43:29,294 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10000, loss[loss=0.06136, simple_loss=0.07954, pruned_loss=0.01274, audio_tagging_loss=0.008852, over 14764.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08969, pruned_loss=0.01198, audio_tagging_loss=0.008335, over 3046455.80 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:43:30,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2023-11-27 05:43:48,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3754073.3333333335, ans=0.0 2023-11-27 05:43:52,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-11-27 05:43:55,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=22.5 2023-11-27 05:43:59,982 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.953e+01 9.636e+01 1.038e+02 1.515e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 05:44:00,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3754140.0, ans=0.0 2023-11-27 05:44:08,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3754206.6666666665, ans=0.025 2023-11-27 05:44:08,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3754206.6666666665, ans=0.125 2023-11-27 05:44:16,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3754273.3333333335, ans=0.125 2023-11-27 05:44:21,983 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563150 2023-11-27 05:44:25,009 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10050, loss[loss=0.06686, simple_loss=0.1008, pruned_loss=0.009246, audio_tagging_loss=0.007192, over 15530.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08997, pruned_loss=0.01212, audio_tagging_loss=0.008348, over 3053513.64 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:44:25,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3754340.0, ans=0.04949747468305833 2023-11-27 05:45:04,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3754540.0, ans=0.125 2023-11-27 05:45:05,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3754540.0, ans=0.1 2023-11-27 05:45:17,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3754606.6666666665, ans=0.125 2023-11-27 05:45:18,011 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563200 2023-11-27 05:45:21,477 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10100, loss[loss=0.07458, simple_loss=0.1005, pruned_loss=0.01654, audio_tagging_loss=0.007767, over 14945.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08946, pruned_loss=0.01203, audio_tagging_loss=0.008447, over 3054746.18 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:45:28,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3754673.3333333335, ans=0.125 2023-11-27 05:45:35,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3754740.0, ans=0.125 2023-11-27 05:45:51,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.906e+01 9.588e+01 1.046e+02 1.335e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 05:45:52,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-27 05:46:01,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3754873.3333333335, ans=0.125 2023-11-27 05:46:05,975 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:46:13,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563250 2023-11-27 05:46:17,009 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10150, loss[loss=0.05599, simple_loss=0.0772, pruned_loss=0.00785, audio_tagging_loss=0.009539, over 14744.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08994, pruned_loss=0.01216, audio_tagging_loss=0.008515, over 3050503.92 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:46:19,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3755006.6666666665, ans=0.1 2023-11-27 05:46:35,687 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:46:37,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3755073.3333333335, ans=0.125 2023-11-27 05:46:40,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3755140.0, ans=0.5 2023-11-27 05:46:42,896 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:47:09,465 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563300 2023-11-27 05:47:11,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3755340.0, ans=0.1 2023-11-27 05:47:12,620 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10200, loss[loss=0.07741, simple_loss=0.1022, pruned_loss=0.01693, audio_tagging_loss=0.009376, over 14613.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09011, pruned_loss=0.0121, audio_tagging_loss=0.008615, over 3049131.04 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:47:18,400 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2023-11-27 05:47:20,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3755340.0, ans=0.0 2023-11-27 05:47:29,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-27 05:47:32,875 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:47:37,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3755473.3333333335, ans=0.1 2023-11-27 05:47:42,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.968e+01 9.089e+01 9.735e+01 1.032e+02 1.277e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 05:47:43,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3755473.3333333335, ans=0.1 2023-11-27 05:48:05,872 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563350 2023-11-27 05:48:08,972 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10250, loss[loss=0.05963, simple_loss=0.0825, pruned_loss=0.01027, audio_tagging_loss=0.008111, over 14521.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09057, pruned_loss=0.01224, audio_tagging_loss=0.008591, over 3049850.12 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:48:12,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3755673.3333333335, ans=0.2 2023-11-27 05:48:14,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3755673.3333333335, ans=0.0 2023-11-27 05:48:35,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2023-11-27 05:49:00,930 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563400 2023-11-27 05:49:04,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.56 vs. limit=10.0 2023-11-27 05:49:04,802 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10300, loss[loss=0.05368, simple_loss=0.06241, pruned_loss=0.009292, audio_tagging_loss=0.01318, over 15950.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08983, pruned_loss=0.01206, audio_tagging_loss=0.008637, over 3053516.55 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:49:25,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3756073.3333333335, ans=0.125 2023-11-27 05:49:32,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3756140.0, ans=0.015 2023-11-27 05:49:34,852 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.184e+01 9.721e+01 1.033e+02 1.459e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 05:49:40,546 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-27 05:49:46,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-11-27 05:49:56,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563450 2023-11-27 05:49:58,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3756273.3333333335, ans=0.125 2023-11-27 05:50:00,360 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10350, loss[loss=0.05265, simple_loss=0.06832, pruned_loss=0.01059, audio_tagging_loss=0.007902, over 14120.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08918, pruned_loss=0.01196, audio_tagging_loss=0.008756, over 3050446.66 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:50:00,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3756340.0, ans=0.125 2023-11-27 05:50:06,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3756340.0, ans=0.2 2023-11-27 05:50:12,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.40 vs. limit=10.0 2023-11-27 05:50:16,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3756406.6666666665, ans=0.0 2023-11-27 05:50:22,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3756473.3333333335, ans=10.0 2023-11-27 05:50:28,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3756473.3333333335, ans=0.2 2023-11-27 05:50:38,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3756540.0, ans=0.125 2023-11-27 05:50:40,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3756540.0, ans=0.0 2023-11-27 05:50:53,235 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563500 2023-11-27 05:50:56,291 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10400, loss[loss=0.04675, simple_loss=0.05667, pruned_loss=0.006717, audio_tagging_loss=0.0117, over 15711.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08849, pruned_loss=0.0118, audio_tagging_loss=0.008836, over 3047088.02 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:51:00,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3756673.3333333335, ans=0.125 2023-11-27 05:51:08,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3756740.0, ans=0.125 2023-11-27 05:51:26,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.687e+01 8.939e+01 9.704e+01 1.060e+02 1.471e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 05:51:29,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3756873.3333333335, ans=0.125 2023-11-27 05:51:46,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3756940.0, ans=0.125 2023-11-27 05:51:48,468 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563550 2023-11-27 05:51:51,586 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10450, loss[loss=0.07558, simple_loss=0.1064, pruned_loss=0.013, audio_tagging_loss=0.00935, over 16254.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08877, pruned_loss=0.01197, audio_tagging_loss=0.008849, over 3041623.47 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:51:57,579 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:52:31,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3757206.6666666665, ans=0.0 2023-11-27 05:52:44,143 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563600 2023-11-27 05:52:47,991 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10500, loss[loss=0.07327, simple_loss=0.1057, pruned_loss=0.01519, audio_tagging_loss=0.005211, over 15537.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08858, pruned_loss=0.01194, audio_tagging_loss=0.008709, over 3041909.44 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:53:07,179 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:53:09,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3757473.3333333335, ans=0.125 2023-11-27 05:53:12,484 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:53:17,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.057e+01 9.549e+01 1.045e+02 1.272e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 05:53:19,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3757540.0, ans=0.125 2023-11-27 05:53:20,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3757540.0, ans=0.1 2023-11-27 05:53:38,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3757606.6666666665, ans=0.0 2023-11-27 05:53:40,864 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563650 2023-11-27 05:53:43,933 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10550, loss[loss=0.07777, simple_loss=0.1041, pruned_loss=0.01611, audio_tagging_loss=0.009594, over 16182.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08861, pruned_loss=0.01186, audio_tagging_loss=0.008712, over 3045747.73 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:54:07,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3757806.6666666665, ans=0.1 2023-11-27 05:54:19,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3757873.3333333335, ans=0.125 2023-11-27 05:54:30,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3757940.0, ans=0.125 2023-11-27 05:54:36,085 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563700 2023-11-27 05:54:39,314 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10600, loss[loss=0.05366, simple_loss=0.07288, pruned_loss=0.01011, audio_tagging_loss=0.007118, over 14781.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.08744, pruned_loss=0.01157, audio_tagging_loss=0.008653, over 3046279.01 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:54:40,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=22.5 2023-11-27 05:55:06,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3758140.0, ans=0.0 2023-11-27 05:55:10,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.995e+01 9.549e+01 1.050e+02 1.253e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 05:55:30,985 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563750 2023-11-27 05:55:34,683 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10650, loss[loss=0.06582, simple_loss=0.1004, pruned_loss=0.009302, audio_tagging_loss=0.006324, over 15542.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08804, pruned_loss=0.01162, audio_tagging_loss=0.008619, over 3047109.04 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:55:34,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3758340.0, ans=0.0 2023-11-27 05:55:40,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3758340.0, ans=0.125 2023-11-27 05:56:02,904 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:56:07,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3758540.0, ans=0.5 2023-11-27 05:56:15,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2023-11-27 05:56:27,484 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563800 2023-11-27 05:56:31,172 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10700, loss[loss=0.06267, simple_loss=0.08628, pruned_loss=0.01059, audio_tagging_loss=0.008933, over 15749.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08898, pruned_loss=0.01177, audio_tagging_loss=0.008557, over 3052126.26 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:56:40,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3758740.0, ans=0.0 2023-11-27 05:56:40,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3758740.0, ans=0.1 2023-11-27 05:56:47,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3758740.0, ans=0.1 2023-11-27 05:56:48,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.88 vs. limit=22.5 2023-11-27 05:56:53,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3758806.6666666665, ans=0.2 2023-11-27 05:56:58,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3758806.6666666665, ans=0.0 2023-11-27 05:56:59,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3758806.6666666665, ans=0.125 2023-11-27 05:57:01,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.877e+01 9.430e+01 1.025e+02 1.516e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 05:57:04,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3758873.3333333335, ans=0.0 2023-11-27 05:57:12,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3758873.3333333335, ans=0.0 2023-11-27 05:57:22,429 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563850 2023-11-27 05:57:25,453 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10750, loss[loss=0.05891, simple_loss=0.08457, pruned_loss=0.006845, audio_tagging_loss=0.009776, over 15251.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08874, pruned_loss=0.01175, audio_tagging_loss=0.008563, over 3048258.34 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:57:39,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-27 05:57:40,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3759073.3333333335, ans=0.0 2023-11-27 05:57:47,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759140.0, ans=0.1 2023-11-27 05:58:01,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=22.5 2023-11-27 05:58:08,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-27 05:58:14,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759273.3333333335, ans=0.1 2023-11-27 05:58:17,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563900 2023-11-27 05:58:20,826 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10800, loss[loss=0.06594, simple_loss=0.09658, pruned_loss=0.01119, audio_tagging_loss=0.006453, over 15674.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08936, pruned_loss=0.01183, audio_tagging_loss=0.008433, over 3046593.92 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:58:24,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2023-11-27 05:58:47,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3759473.3333333335, ans=0.07 2023-11-27 05:58:52,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 9.069e+01 9.752e+01 1.037e+02 1.652e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 05:59:04,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3759606.6666666665, ans=0.05 2023-11-27 05:59:14,663 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563950 2023-11-27 05:59:17,769 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10850, loss[loss=0.06287, simple_loss=0.08628, pruned_loss=0.01136, audio_tagging_loss=0.008367, over 16931.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09063, pruned_loss=0.01214, audio_tagging_loss=0.008487, over 3052662.62 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:59:17,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3759673.3333333335, ans=0.0 2023-11-27 05:59:23,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3759673.3333333335, ans=0.2 2023-11-27 05:59:26,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2023-11-27 05:59:35,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3759740.0, ans=0.07 2023-11-27 05:59:43,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2023-11-27 05:59:51,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3759873.3333333335, ans=0.125 2023-11-27 05:59:53,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3759873.3333333335, ans=0.0 2023-11-27 05:59:54,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3759873.3333333335, ans=0.1 2023-11-27 06:00:00,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3759873.3333333335, ans=0.125 2023-11-27 06:00:04,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3759940.0, ans=0.035 2023-11-27 06:00:08,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3759940.0, ans=0.04949747468305833 2023-11-27 06:00:10,335 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:00:10,395 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564000 2023-11-27 06:00:15,711 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10900, loss[loss=0.06058, simple_loss=0.07482, pruned_loss=0.0102, audio_tagging_loss=0.01296, over 15344.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0904, pruned_loss=0.01203, audio_tagging_loss=0.008472, over 3055326.27 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:00:41,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3760140.0, ans=0.125 2023-11-27 06:00:47,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.933e+01 9.704e+01 1.050e+02 1.255e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 06:00:50,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2023-11-27 06:01:07,488 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564050 2023-11-27 06:01:08,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-27 06:01:10,580 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10950, loss[loss=0.07084, simple_loss=0.0919, pruned_loss=0.0158, audio_tagging_loss=0.009097, over 14022.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08908, pruned_loss=0.01177, audio_tagging_loss=0.008556, over 3051572.73 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:01:12,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3760340.0, ans=0.0 2023-11-27 06:01:16,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3760340.0, ans=0.2 2023-11-27 06:01:21,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3760406.6666666665, ans=0.125 2023-11-27 06:01:39,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3760473.3333333335, ans=0.0 2023-11-27 06:02:02,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2023-11-27 06:02:03,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564100 2023-11-27 06:02:06,786 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11000, loss[loss=0.05877, simple_loss=0.07333, pruned_loss=0.0103, audio_tagging_loss=0.0118, over 14803.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.0902, pruned_loss=0.01192, audio_tagging_loss=0.008546, over 3054817.82 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:02:09,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.79 vs. limit=10.0 2023-11-27 06:02:15,272 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:02:38,495 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.834e+01 9.840e+01 1.033e+02 1.285e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 06:02:45,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3760873.3333333335, ans=0.05 2023-11-27 06:02:54,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3760940.0, ans=0.0 2023-11-27 06:02:59,700 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564150 2023-11-27 06:03:01,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3761006.6666666665, ans=0.0 2023-11-27 06:03:02,825 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11050, loss[loss=0.0697, simple_loss=0.09402, pruned_loss=0.01486, audio_tagging_loss=0.007834, over 14535.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09098, pruned_loss=0.012, audio_tagging_loss=0.008622, over 3053631.85 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:03:05,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3761006.6666666665, ans=0.2 2023-11-27 06:03:05,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3761006.6666666665, ans=0.05 2023-11-27 06:03:21,304 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-11-27 06:03:32,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3761140.0, ans=0.125 2023-11-27 06:03:44,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3761206.6666666665, ans=0.125 2023-11-27 06:03:46,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-27 06:03:54,075 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564200 2023-11-27 06:03:57,486 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11100, loss[loss=0.05151, simple_loss=0.07257, pruned_loss=0.00838, audio_tagging_loss=0.006844, over 14963.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08951, pruned_loss=0.01185, audio_tagging_loss=0.008785, over 3044215.77 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:04:11,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3761406.6666666665, ans=0.0 2023-11-27 06:04:30,088 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 9.171e+01 9.681e+01 1.044e+02 1.229e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 06:04:44,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=15.0 2023-11-27 06:04:45,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-11-27 06:04:47,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3761606.6666666665, ans=0.125 2023-11-27 06:04:49,830 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564250 2023-11-27 06:04:52,951 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11150, loss[loss=0.05636, simple_loss=0.0752, pruned_loss=0.009919, audio_tagging_loss=0.008846, over 15532.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08937, pruned_loss=0.01193, audio_tagging_loss=0.008815, over 3037473.83 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:04:53,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3761673.3333333335, ans=0.125 2023-11-27 06:04:58,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3761673.3333333335, ans=0.125 2023-11-27 06:05:32,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3761873.3333333335, ans=0.1 2023-11-27 06:05:32,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3761873.3333333335, ans=0.125 2023-11-27 06:05:45,965 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564300 2023-11-27 06:05:48,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2023-11-27 06:05:49,605 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11200, loss[loss=0.07185, simple_loss=0.1077, pruned_loss=0.01058, audio_tagging_loss=0.007426, over 15757.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09021, pruned_loss=0.01211, audio_tagging_loss=0.008788, over 3040738.40 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:06:04,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3762073.3333333335, ans=0.125 2023-11-27 06:06:12,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=10.0 2023-11-27 06:06:22,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 9.248e+01 9.841e+01 1.044e+02 1.353e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 06:06:36,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3762273.3333333335, ans=0.05 2023-11-27 06:06:41,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564350 2023-11-27 06:06:44,803 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11250, loss[loss=0.07354, simple_loss=0.08951, pruned_loss=0.01826, audio_tagging_loss=0.01053, over 14907.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08954, pruned_loss=0.01214, audio_tagging_loss=0.008916, over 3037971.02 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:06:46,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3762340.0, ans=0.125 2023-11-27 06:06:46,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3762340.0, ans=12.0 2023-11-27 06:06:59,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3762406.6666666665, ans=0.95 2023-11-27 06:07:00,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3762406.6666666665, ans=0.0 2023-11-27 06:07:05,191 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2023-11-27 06:07:17,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3762540.0, ans=0.1 2023-11-27 06:07:36,579 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564400 2023-11-27 06:07:40,555 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11300, loss[loss=0.06302, simple_loss=0.08545, pruned_loss=0.01165, audio_tagging_loss=0.008644, over 16005.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08916, pruned_loss=0.0121, audio_tagging_loss=0.008813, over 3032861.38 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:07:47,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2023-11-27 06:07:51,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3762740.0, ans=0.09899494936611666 2023-11-27 06:07:58,468 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2023-11-27 06:08:01,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3762740.0, ans=0.125 2023-11-27 06:08:13,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 9.028e+01 9.588e+01 1.026e+02 1.216e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:08:33,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564450 2023-11-27 06:08:33,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3762940.0, ans=0.125 2023-11-27 06:08:36,655 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11350, loss[loss=0.07203, simple_loss=0.09884, pruned_loss=0.01442, audio_tagging_loss=0.008182, over 13406.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08949, pruned_loss=0.01219, audio_tagging_loss=0.008663, over 3033874.62 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:08:44,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3763006.6666666665, ans=0.05 2023-11-27 06:09:00,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3763140.0, ans=0.125 2023-11-27 06:09:08,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763206.6666666665, ans=0.1 2023-11-27 06:09:11,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3763206.6666666665, ans=0.125 2023-11-27 06:09:11,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3763206.6666666665, ans=0.2 2023-11-27 06:09:23,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=15.0 2023-11-27 06:09:28,880 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564500 2023-11-27 06:09:30,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3763273.3333333335, ans=0.0 2023-11-27 06:09:31,966 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11400, loss[loss=0.04938, simple_loss=0.06172, pruned_loss=0.008753, audio_tagging_loss=0.009768, over 15636.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08887, pruned_loss=0.01218, audio_tagging_loss=0.008588, over 3037145.39 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:09:35,715 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-11-27 06:09:41,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3763406.6666666665, ans=0.125 2023-11-27 06:09:45,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3763406.6666666665, ans=0.0 2023-11-27 06:09:46,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3763406.6666666665, ans=10.0 2023-11-27 06:09:47,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3763406.6666666665, ans=0.125 2023-11-27 06:09:56,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3763473.3333333335, ans=0.125 2023-11-27 06:10:02,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3763473.3333333335, ans=0.125 2023-11-27 06:10:05,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 9.070e+01 9.706e+01 1.041e+02 1.301e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 06:10:22,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-27 06:10:23,652 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564550 2023-11-27 06:10:23,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3763606.6666666665, ans=0.125 2023-11-27 06:10:27,262 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11450, loss[loss=0.06793, simple_loss=0.09359, pruned_loss=0.01017, audio_tagging_loss=0.01096, over 15472.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08916, pruned_loss=0.01221, audio_tagging_loss=0.008551, over 3037082.01 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:10:36,729 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-27 06:10:41,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3763740.0, ans=0.125 2023-11-27 06:10:51,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3763806.6666666665, ans=0.0 2023-11-27 06:10:58,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3763806.6666666665, ans=0.0 2023-11-27 06:11:13,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3763940.0, ans=0.0 2023-11-27 06:11:16,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3763940.0, ans=0.125 2023-11-27 06:11:18,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3763940.0, ans=0.125 2023-11-27 06:11:19,514 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564600 2023-11-27 06:11:23,083 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11500, loss[loss=0.07034, simple_loss=0.1078, pruned_loss=0.01102, audio_tagging_loss=0.00541, over 15442.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08895, pruned_loss=0.01204, audio_tagging_loss=0.008599, over 3042542.84 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:11:26,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3764006.6666666665, ans=0.125 2023-11-27 06:11:39,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.93 vs. limit=15.0 2023-11-27 06:11:55,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.975e+01 9.589e+01 1.045e+02 1.244e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:11:57,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3764206.6666666665, ans=0.2 2023-11-27 06:12:08,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3764273.3333333335, ans=0.125 2023-11-27 06:12:08,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3764273.3333333335, ans=0.0 2023-11-27 06:12:14,906 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564650 2023-11-27 06:12:18,057 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11550, loss[loss=0.05739, simple_loss=0.07985, pruned_loss=0.007579, audio_tagging_loss=0.009889, over 15094.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.0893, pruned_loss=0.01193, audio_tagging_loss=0.00853, over 3045891.73 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 06:12:20,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3764340.0, ans=0.125 2023-11-27 06:12:25,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3764340.0, ans=0.0 2023-11-27 06:12:28,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3764406.6666666665, ans=0.125 2023-11-27 06:12:34,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3764406.6666666665, ans=0.0 2023-11-27 06:12:42,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3764473.3333333335, ans=15.0 2023-11-27 06:12:50,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3764473.3333333335, ans=0.125 2023-11-27 06:12:52,045 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:12:52,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3764540.0, ans=0.125 2023-11-27 06:12:57,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-11-27 06:13:10,514 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564700 2023-11-27 06:13:13,678 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11600, loss[loss=0.04488, simple_loss=0.04896, pruned_loss=0.009234, audio_tagging_loss=0.01116, over 14808.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08908, pruned_loss=0.01199, audio_tagging_loss=0.008566, over 3042446.35 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:13:38,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3764806.6666666665, ans=0.2 2023-11-27 06:13:48,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.097e+01 9.719e+01 1.053e+02 1.388e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 06:13:49,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-27 06:13:54,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3764873.3333333335, ans=0.0 2023-11-27 06:14:06,622 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564750 2023-11-27 06:14:09,721 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11650, loss[loss=0.07873, simple_loss=0.1084, pruned_loss=0.01431, audio_tagging_loss=0.0102, over 15639.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08954, pruned_loss=0.01204, audio_tagging_loss=0.008514, over 3048581.88 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:14:22,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3765073.3333333335, ans=0.2 2023-11-27 06:14:47,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3765206.6666666665, ans=0.2 2023-11-27 06:15:02,219 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564800 2023-11-27 06:15:05,647 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11700, loss[loss=0.04983, simple_loss=0.06637, pruned_loss=0.006215, audio_tagging_loss=0.01043, over 15605.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08928, pruned_loss=0.01195, audio_tagging_loss=0.008532, over 3050862.47 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:15:06,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-27 06:15:11,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3765340.0, ans=0.2 2023-11-27 06:15:11,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3765340.0, ans=10.0 2023-11-27 06:15:15,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3765340.0, ans=0.125 2023-11-27 06:15:20,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3765406.6666666665, ans=0.125 2023-11-27 06:15:37,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3765473.3333333335, ans=0.2 2023-11-27 06:15:40,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.890e+01 9.642e+01 1.040e+02 1.676e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 06:15:41,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3765540.0, ans=0.125 2023-11-27 06:15:42,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3765540.0, ans=0.035 2023-11-27 06:15:55,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2023-11-27 06:15:58,286 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564850 2023-11-27 06:16:01,389 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11750, loss[loss=0.08442, simple_loss=0.1257, pruned_loss=0.01317, audio_tagging_loss=0.008425, over 16548.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08956, pruned_loss=0.01198, audio_tagging_loss=0.008667, over 3054999.49 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:16:03,900 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2023-11-27 06:16:07,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3765673.3333333335, ans=0.0 2023-11-27 06:16:14,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3765740.0, ans=0.2 2023-11-27 06:16:29,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3765806.6666666665, ans=0.0 2023-11-27 06:16:51,635 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:16:52,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3765940.0, ans=0.0 2023-11-27 06:16:54,073 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564900 2023-11-27 06:16:57,159 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11800, loss[loss=0.08643, simple_loss=0.1158, pruned_loss=0.02207, audio_tagging_loss=0.006482, over 15994.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08977, pruned_loss=0.01204, audio_tagging_loss=0.008733, over 3055314.02 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:17:31,472 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.000e+01 8.890e+01 9.734e+01 1.045e+02 1.276e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 06:17:33,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2023-11-27 06:17:44,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3766273.3333333335, ans=0.125 2023-11-27 06:17:49,607 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564950 2023-11-27 06:17:52,761 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11850, loss[loss=0.07405, simple_loss=0.09264, pruned_loss=0.01609, audio_tagging_loss=0.01164, over 14002.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08914, pruned_loss=0.01188, audio_tagging_loss=0.008803, over 3047397.71 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:17:55,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3766340.0, ans=0.1 2023-11-27 06:17:57,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3766340.0, ans=0.125 2023-11-27 06:18:03,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3766406.6666666665, ans=0.1 2023-11-27 06:18:24,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3766473.3333333335, ans=0.125 2023-11-27 06:18:30,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3766540.0, ans=0.125 2023-11-27 06:18:37,599 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:18:43,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=22.5 2023-11-27 06:18:44,767 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565000 2023-11-27 06:18:48,173 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11900, loss[loss=0.06415, simple_loss=0.08556, pruned_loss=0.01335, audio_tagging_loss=0.008013, over 15596.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08909, pruned_loss=0.01183, audio_tagging_loss=0.008898, over 3049789.03 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:19:01,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3766740.0, ans=0.125 2023-11-27 06:19:03,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3766740.0, ans=0.125 2023-11-27 06:19:07,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3766740.0, ans=0.125 2023-11-27 06:19:09,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3766740.0, ans=0.2 2023-11-27 06:19:23,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.781e+01 9.449e+01 1.024e+02 1.260e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 06:19:41,262 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565050 2023-11-27 06:19:44,897 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11950, loss[loss=0.04691, simple_loss=0.06255, pruned_loss=0.006102, audio_tagging_loss=0.009528, over 15340.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08824, pruned_loss=0.0117, audio_tagging_loss=0.008986, over 3046882.03 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:19:45,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3767006.6666666665, ans=0.1 2023-11-27 06:19:56,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-27 06:19:58,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3767073.3333333335, ans=10.0 2023-11-27 06:20:08,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3767140.0, ans=0.125 2023-11-27 06:20:24,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3767206.6666666665, ans=0.0 2023-11-27 06:20:35,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565100 2023-11-27 06:20:35,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3767273.3333333335, ans=0.1 2023-11-27 06:20:36,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3767273.3333333335, ans=0.125 2023-11-27 06:20:38,326 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 12000, loss[loss=0.08117, simple_loss=0.1135, pruned_loss=0.01794, audio_tagging_loss=0.006489, over 16574.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08937, pruned_loss=0.01197, audio_tagging_loss=0.00903, over 3049538.57 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:20:38,327 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 06:21:10,518 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.0578, simple_loss=0.05045, pruned_loss=0.005285, audio_tagging_loss=0.02729, over 4681554.00 frames. 2023-11-27 06:21:10,518 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 06:21:14,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2023-11-27 06:21:14,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3767340.0, ans=10.0 2023-11-27 06:21:28,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3767406.6666666665, ans=0.0 2023-11-27 06:22:01,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3767493.3333333335, ans=0.125 2023-11-27 06:22:02,635 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 0, loss[loss=0.09001, simple_loss=0.1138, pruned_loss=0.0158, audio_tagging_loss=0.01732, over 16248.00 frames. ], tot_loss[loss=0.09001, simple_loss=0.1138, pruned_loss=0.0158, audio_tagging_loss=0.01732, over 16248.00 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:22:02,636 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 06:22:20,256 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1328, 2.4192, 5.0707, 2.9816], device='cuda:2') 2023-11-27 06:22:25,565 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4912, 3.8401, 3.1182, 3.8331], device='cuda:2') 2023-11-27 06:22:30,828 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9557, 3.1300, 2.9252, 3.1174, 3.3510, 2.7713, 3.4091, 2.6033], device='cuda:2') 2023-11-27 06:22:33,987 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05791, simple_loss=0.05045, pruned_loss=0.005281, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-27 06:22:33,987 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 06:22:39,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3767493.3333333335, ans=0.0 2023-11-27 06:22:43,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 9.223e+01 9.944e+01 1.084e+02 1.467e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-27 06:22:50,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3767560.0, ans=0.0 2023-11-27 06:23:00,868 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565150 2023-11-27 06:23:01,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.77 vs. limit=10.0 2023-11-27 06:23:08,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3767693.3333333335, ans=0.0 2023-11-27 06:23:10,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2023-11-27 06:23:14,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3767693.3333333335, ans=0.035 2023-11-27 06:23:22,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3767760.0, ans=0.1 2023-11-27 06:23:30,011 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 50, loss[loss=0.06508, simple_loss=0.08179, pruned_loss=0.01029, audio_tagging_loss=0.0139, over 15452.00 frames. ], tot_loss[loss=0.07105, simple_loss=0.08667, pruned_loss=0.01092, audio_tagging_loss=0.0168, over 683402.56 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:23:39,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3767826.6666666665, ans=0.0 2023-11-27 06:23:41,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3767893.3333333335, ans=0.125 2023-11-27 06:23:50,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3767960.0, ans=0.125 2023-11-27 06:23:55,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.00 vs. limit=10.0 2023-11-27 06:23:56,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565200 2023-11-27 06:24:02,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3768026.6666666665, ans=0.125 2023-11-27 06:24:02,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-11-27 06:24:05,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-27 06:24:24,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3768093.3333333335, ans=0.0 2023-11-27 06:24:26,090 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 100, loss[loss=0.06532, simple_loss=0.09039, pruned_loss=0.009625, audio_tagging_loss=0.0105, over 14832.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.0869, pruned_loss=0.01125, audio_tagging_loss=0.01611, over 1199254.72 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:24:27,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3768160.0, ans=0.125 2023-11-27 06:24:29,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3768160.0, ans=0.125 2023-11-27 06:24:31,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3768160.0, ans=0.5 2023-11-27 06:24:35,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.383e+01 1.012e+02 1.072e+02 1.151e+02 1.382e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-27 06:24:39,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3768226.6666666665, ans=0.125 2023-11-27 06:24:44,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768226.6666666665, ans=0.1 2023-11-27 06:24:46,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3768226.6666666665, ans=0.0 2023-11-27 06:24:52,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3768293.3333333335, ans=0.1 2023-11-27 06:24:53,262 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565250 2023-11-27 06:25:02,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3768360.0, ans=0.0 2023-11-27 06:25:12,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3768426.6666666665, ans=0.125 2023-11-27 06:25:14,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3768426.6666666665, ans=0.1 2023-11-27 06:25:18,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2023-11-27 06:25:21,914 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 150, loss[loss=0.07341, simple_loss=0.1009, pruned_loss=0.01579, audio_tagging_loss=0.007196, over 16374.00 frames. ], tot_loss[loss=0.06967, simple_loss=0.0874, pruned_loss=0.01152, audio_tagging_loss=0.01445, over 1609502.27 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:25:32,028 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:25:32,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3768560.0, ans=0.2 2023-11-27 06:25:33,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2023-11-27 06:25:39,723 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:25:39,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3768560.0, ans=0.0 2023-11-27 06:25:46,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3768626.6666666665, ans=0.04949747468305833 2023-11-27 06:25:49,129 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565300 2023-11-27 06:26:04,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768693.3333333335, ans=0.1 2023-11-27 06:26:07,149 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=12.0 2023-11-27 06:26:18,031 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 200, loss[loss=0.0695, simple_loss=0.09773, pruned_loss=0.01082, audio_tagging_loss=0.009814, over 15360.00 frames. ], tot_loss[loss=0.06935, simple_loss=0.0898, pruned_loss=0.01182, audio_tagging_loss=0.01263, over 1931902.48 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:26:28,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 9.239e+01 9.831e+01 1.046e+02 1.283e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 06:26:32,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3768893.3333333335, ans=0.0 2023-11-27 06:26:39,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3768960.0, ans=10.0 2023-11-27 06:26:44,301 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565350 2023-11-27 06:27:13,808 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 250, loss[loss=0.06404, simple_loss=0.09268, pruned_loss=0.01029, audio_tagging_loss=0.007406, over 15077.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.08988, pruned_loss=0.01177, audio_tagging_loss=0.01134, over 2183761.22 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:27:15,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3769160.0, ans=0.125 2023-11-27 06:27:34,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3769293.3333333335, ans=0.125 2023-11-27 06:27:40,220 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565400 2023-11-27 06:27:45,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3769293.3333333335, ans=0.07 2023-11-27 06:27:55,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3769360.0, ans=0.0 2023-11-27 06:27:55,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3769360.0, ans=0.0 2023-11-27 06:28:04,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3769426.6666666665, ans=0.0 2023-11-27 06:28:09,441 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 300, loss[loss=0.04905, simple_loss=0.06503, pruned_loss=0.007966, audio_tagging_loss=0.008566, over 14356.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09035, pruned_loss=0.01181, audio_tagging_loss=0.01053, over 2375963.61 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 06:28:20,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 9.040e+01 9.670e+01 1.035e+02 1.237e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-27 06:28:28,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.04 vs. limit=22.5 2023-11-27 06:28:31,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3769626.6666666665, ans=0.0 2023-11-27 06:28:37,050 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565450 2023-11-27 06:28:40,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2023-11-27 06:29:04,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3769826.6666666665, ans=0.125 2023-11-27 06:29:05,746 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 350, loss[loss=0.05587, simple_loss=0.07783, pruned_loss=0.009811, audio_tagging_loss=0.007141, over 15085.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09073, pruned_loss=0.012, audio_tagging_loss=0.01001, over 2532062.01 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 06:29:15,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=12.0 2023-11-27 06:29:32,233 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565500 2023-11-27 06:29:51,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3770093.3333333335, ans=0.125 2023-11-27 06:29:52,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3770093.3333333335, ans=0.125 2023-11-27 06:30:01,402 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 400, loss[loss=0.04764, simple_loss=0.06641, pruned_loss=0.005542, audio_tagging_loss=0.008888, over 14776.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09089, pruned_loss=0.01202, audio_tagging_loss=0.00963, over 2644638.62 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:30:11,986 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.851e+01 9.492e+01 1.029e+02 1.198e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 06:30:14,280 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:30:14,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3770226.6666666665, ans=0.2 2023-11-27 06:30:18,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3770226.6666666665, ans=0.0 2023-11-27 06:30:27,488 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565550 2023-11-27 06:30:56,994 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 450, loss[loss=0.06538, simple_loss=0.08555, pruned_loss=0.0142, audio_tagging_loss=0.008412, over 14740.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09022, pruned_loss=0.01201, audio_tagging_loss=0.00932, over 2729009.47 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:31:02,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3770493.3333333335, ans=0.2 2023-11-27 06:31:12,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3770560.0, ans=0.125 2023-11-27 06:31:24,669 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565600 2023-11-27 06:31:46,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-27 06:31:53,041 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 500, loss[loss=0.05535, simple_loss=0.0765, pruned_loss=0.008147, audio_tagging_loss=0.008948, over 14441.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08957, pruned_loss=0.01176, audio_tagging_loss=0.00915, over 2797458.07 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:31:53,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3770826.6666666665, ans=0.0 2023-11-27 06:32:05,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 9.003e+01 9.804e+01 1.033e+02 1.335e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-27 06:32:05,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3770893.3333333335, ans=0.125 2023-11-27 06:32:18,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3770960.0, ans=0.125 2023-11-27 06:32:20,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565650 2023-11-27 06:32:21,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3770960.0, ans=6.0 2023-11-27 06:32:32,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3771026.6666666665, ans=0.0 2023-11-27 06:32:50,023 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 550, loss[loss=0.07271, simple_loss=0.09976, pruned_loss=0.01364, audio_tagging_loss=0.009188, over 16057.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08962, pruned_loss=0.01188, audio_tagging_loss=0.009021, over 2849288.36 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:32:51,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3771160.0, ans=0.125 2023-11-27 06:33:16,293 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565700 2023-11-27 06:33:29,400 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2023-11-27 06:33:42,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3771426.6666666665, ans=0.125 2023-11-27 06:33:45,510 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 600, loss[loss=0.07088, simple_loss=0.1016, pruned_loss=0.01244, audio_tagging_loss=0.007617, over 14831.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09031, pruned_loss=0.01202, audio_tagging_loss=0.008864, over 2894571.54 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:33:50,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-11-27 06:33:53,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-11-27 06:33:56,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.995e+01 9.614e+01 1.020e+02 1.289e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 06:34:07,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3771626.6666666665, ans=0.0 2023-11-27 06:34:12,572 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565750 2023-11-27 06:34:12,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3771626.6666666665, ans=0.125 2023-11-27 06:34:29,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3771760.0, ans=0.0 2023-11-27 06:34:41,267 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 650, loss[loss=0.06628, simple_loss=0.08565, pruned_loss=0.01403, audio_tagging_loss=0.009429, over 15016.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08998, pruned_loss=0.01204, audio_tagging_loss=0.008836, over 2930364.34 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:34:45,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=12.0 2023-11-27 06:35:08,445 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565800 2023-11-27 06:35:14,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3772026.6666666665, ans=0.125 2023-11-27 06:35:15,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2023-11-27 06:35:22,416 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2023-11-27 06:35:37,984 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 700, loss[loss=0.05314, simple_loss=0.06779, pruned_loss=0.01215, audio_tagging_loss=0.007095, over 15071.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08943, pruned_loss=0.01186, audio_tagging_loss=0.008808, over 2960680.12 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:35:42,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3772160.0, ans=0.125 2023-11-27 06:35:49,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.967e+01 9.614e+01 1.031e+02 1.404e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 06:35:49,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3772226.6666666665, ans=0.025 2023-11-27 06:35:52,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3772226.6666666665, ans=0.125 2023-11-27 06:36:04,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565850 2023-11-27 06:36:04,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3772293.3333333335, ans=0.125 2023-11-27 06:36:09,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3772293.3333333335, ans=0.125 2023-11-27 06:36:12,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3772360.0, ans=0.125 2023-11-27 06:36:16,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3772360.0, ans=0.125 2023-11-27 06:36:25,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=22.5 2023-11-27 06:36:26,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=22.5 2023-11-27 06:36:33,925 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 750, loss[loss=0.07105, simple_loss=0.09901, pruned_loss=0.01373, audio_tagging_loss=0.007817, over 15217.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.0896, pruned_loss=0.01186, audio_tagging_loss=0.008753, over 2976746.37 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:36:45,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3772560.0, ans=0.125 2023-11-27 06:36:45,381 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=12.0 2023-11-27 06:36:54,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3772560.0, ans=0.0 2023-11-27 06:36:57,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3772626.6666666665, ans=0.0 2023-11-27 06:37:01,101 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565900 2023-11-27 06:37:13,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3772693.3333333335, ans=0.2 2023-11-27 06:37:16,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3772693.3333333335, ans=0.125 2023-11-27 06:37:17,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3772760.0, ans=0.125 2023-11-27 06:37:26,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3772760.0, ans=0.0 2023-11-27 06:37:29,626 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 800, loss[loss=0.05997, simple_loss=0.07849, pruned_loss=0.01094, audio_tagging_loss=0.009786, over 14388.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0897, pruned_loss=0.01206, audio_tagging_loss=0.008739, over 2991947.73 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:37:40,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.165e+01 9.801e+01 1.067e+02 1.276e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-27 06:37:42,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-27 06:37:49,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3772893.3333333335, ans=0.0 2023-11-27 06:37:56,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2023-11-27 06:37:56,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565950 2023-11-27 06:38:03,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-27 06:38:26,149 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 850, loss[loss=0.05431, simple_loss=0.06995, pruned_loss=0.008559, audio_tagging_loss=0.01078, over 14989.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08962, pruned_loss=0.01203, audio_tagging_loss=0.008857, over 3003338.57 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:38:30,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3773160.0, ans=0.0 2023-11-27 06:38:35,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.34 vs. limit=6.0 2023-11-27 06:38:45,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3773226.6666666665, ans=0.0 2023-11-27 06:38:52,617 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566000 2023-11-27 06:39:00,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3773360.0, ans=0.125 2023-11-27 06:39:01,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-27 06:39:02,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3773360.0, ans=0.125 2023-11-27 06:39:02,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.79 vs. limit=15.0 2023-11-27 06:39:06,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-27 06:39:08,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-27 06:39:16,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3773426.6666666665, ans=0.125 2023-11-27 06:39:16,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2023-11-27 06:39:18,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3773426.6666666665, ans=0.125 2023-11-27 06:39:21,792 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 900, loss[loss=0.08593, simple_loss=0.1224, pruned_loss=0.01867, audio_tagging_loss=0.00606, over 16162.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08979, pruned_loss=0.01209, audio_tagging_loss=0.008829, over 3018690.23 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:39:28,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3773493.3333333335, ans=0.125 2023-11-27 06:39:34,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.912e+01 9.588e+01 1.041e+02 1.300e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:39:48,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3773626.6666666665, ans=0.125 2023-11-27 06:39:49,510 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566050 2023-11-27 06:39:57,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=15.0 2023-11-27 06:40:18,021 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 950, loss[loss=0.06911, simple_loss=0.09239, pruned_loss=0.01429, audio_tagging_loss=0.00862, over 15482.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08987, pruned_loss=0.01211, audio_tagging_loss=0.008664, over 3029880.65 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:40:27,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3773826.6666666665, ans=0.125 2023-11-27 06:40:28,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3773893.3333333335, ans=0.025 2023-11-27 06:40:29,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-27 06:40:30,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3773893.3333333335, ans=0.2 2023-11-27 06:40:44,805 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566100 2023-11-27 06:41:03,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.5 2023-11-27 06:41:04,277 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:41:06,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3774093.3333333335, ans=0.125 2023-11-27 06:41:14,673 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1000, loss[loss=0.06489, simple_loss=0.08909, pruned_loss=0.01177, audio_tagging_loss=0.008572, over 14336.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.0894, pruned_loss=0.01195, audio_tagging_loss=0.008535, over 3030892.72 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:41:26,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 9.089e+01 9.617e+01 1.045e+02 2.026e+02, threshold=1.923e+02, percent-clipped=1.0 2023-11-27 06:41:27,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3774226.6666666665, ans=0.125 2023-11-27 06:41:33,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3774226.6666666665, ans=0.0 2023-11-27 06:41:37,414 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:41:41,195 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566150 2023-11-27 06:41:43,927 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2023-11-27 06:41:48,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3774360.0, ans=0.125 2023-11-27 06:41:58,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3774426.6666666665, ans=0.125 2023-11-27 06:42:08,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3774426.6666666665, ans=0.05 2023-11-27 06:42:10,052 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1050, loss[loss=0.06291, simple_loss=0.08423, pruned_loss=0.01253, audio_tagging_loss=0.008265, over 15065.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08977, pruned_loss=0.01205, audio_tagging_loss=0.008452, over 3035570.47 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:42:20,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3774560.0, ans=0.125 2023-11-27 06:42:23,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3774560.0, ans=0.0 2023-11-27 06:42:30,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3774560.0, ans=0.04949747468305833 2023-11-27 06:42:37,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566200 2023-11-27 06:42:57,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3774760.0, ans=0.0 2023-11-27 06:43:06,173 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1100, loss[loss=0.05691, simple_loss=0.08283, pruned_loss=0.008748, audio_tagging_loss=0.006747, over 14579.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08886, pruned_loss=0.0119, audio_tagging_loss=0.008462, over 3037331.20 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:43:06,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3774826.6666666665, ans=0.025 2023-11-27 06:43:08,304 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:43:18,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.812e+01 9.589e+01 1.029e+02 1.833e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:43:33,092 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566250 2023-11-27 06:43:33,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3774960.0, ans=0.125 2023-11-27 06:43:35,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3774960.0, ans=0.07 2023-11-27 06:43:36,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3774960.0, ans=0.0 2023-11-27 06:44:02,253 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1150, loss[loss=0.06644, simple_loss=0.08592, pruned_loss=0.01549, audio_tagging_loss=0.007991, over 15264.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08951, pruned_loss=0.01202, audio_tagging_loss=0.008383, over 3039577.31 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:44:02,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3775160.0, ans=0.125 2023-11-27 06:44:03,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3775160.0, ans=0.125 2023-11-27 06:44:11,998 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=12.0 2023-11-27 06:44:28,486 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566300 2023-11-27 06:44:38,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.29 vs. limit=5.0 2023-11-27 06:44:57,786 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1200, loss[loss=0.06998, simple_loss=0.09908, pruned_loss=0.01259, audio_tagging_loss=0.007853, over 15819.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08964, pruned_loss=0.01202, audio_tagging_loss=0.00834, over 3041048.33 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:45:09,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 8.861e+01 9.720e+01 1.059e+02 1.366e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 06:45:24,958 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566350 2023-11-27 06:45:31,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3775693.3333333335, ans=0.125 2023-11-27 06:45:42,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3775760.0, ans=0.125 2023-11-27 06:45:53,277 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1250, loss[loss=0.08016, simple_loss=0.1117, pruned_loss=0.01772, audio_tagging_loss=0.006605, over 15223.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.0903, pruned_loss=0.01214, audio_tagging_loss=0.008341, over 3039563.52 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:46:13,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3775893.3333333335, ans=0.0 2023-11-27 06:46:21,040 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566400 2023-11-27 06:46:22,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3775960.0, ans=0.2 2023-11-27 06:46:23,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3775960.0, ans=0.0 2023-11-27 06:46:36,400 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:46:48,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-27 06:46:49,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3776160.0, ans=0.125 2023-11-27 06:46:50,490 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1300, loss[loss=0.07465, simple_loss=0.09947, pruned_loss=0.01676, audio_tagging_loss=0.008152, over 14524.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08979, pruned_loss=0.01195, audio_tagging_loss=0.00839, over 3039271.67 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:47:02,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.746e+01 9.407e+01 9.901e+01 1.217e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 06:47:08,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3776226.6666666665, ans=0.0 2023-11-27 06:47:08,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3776226.6666666665, ans=0.07 2023-11-27 06:47:16,613 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566450 2023-11-27 06:47:28,556 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:47:46,233 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1350, loss[loss=0.05802, simple_loss=0.0778, pruned_loss=0.009564, audio_tagging_loss=0.00956, over 16636.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08957, pruned_loss=0.0119, audio_tagging_loss=0.008481, over 3043029.12 frames. ], batch size: 65, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:48:01,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3776560.0, ans=0.125 2023-11-27 06:48:12,893 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566500 2023-11-27 06:48:26,737 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:48:26,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3776693.3333333335, ans=0.0 2023-11-27 06:48:28,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3776693.3333333335, ans=0.0 2023-11-27 06:48:33,685 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-27 06:48:41,665 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1400, loss[loss=0.06709, simple_loss=0.09412, pruned_loss=0.01304, audio_tagging_loss=0.006997, over 15263.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08983, pruned_loss=0.01193, audio_tagging_loss=0.008489, over 3049773.56 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:48:49,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3776826.6666666665, ans=0.0 2023-11-27 06:48:55,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.911e+01 9.491e+01 1.013e+02 1.381e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 06:49:09,742 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566550 2023-11-27 06:49:13,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3776960.0, ans=0.125 2023-11-27 06:49:13,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=22.5 2023-11-27 06:49:37,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3777160.0, ans=0.1 2023-11-27 06:49:38,342 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1450, loss[loss=0.05864, simple_loss=0.0808, pruned_loss=0.01073, audio_tagging_loss=0.007512, over 14661.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09021, pruned_loss=0.01208, audio_tagging_loss=0.008532, over 3049023.00 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:49:38,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3777160.0, ans=0.125 2023-11-27 06:49:57,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3777226.6666666665, ans=0.125 2023-11-27 06:50:05,186 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566600 2023-11-27 06:50:14,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3777360.0, ans=0.035 2023-11-27 06:50:29,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3777426.6666666665, ans=0.1 2023-11-27 06:50:34,468 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1500, loss[loss=0.08375, simple_loss=0.1217, pruned_loss=0.01257, audio_tagging_loss=0.01033, over 16042.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09056, pruned_loss=0.01216, audio_tagging_loss=0.008633, over 3048642.99 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:50:47,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.689e+01 9.111e+01 9.628e+01 1.038e+02 1.478e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 06:50:59,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3777626.6666666665, ans=0.2 2023-11-27 06:51:00,525 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566650 2023-11-27 06:51:16,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3777693.3333333335, ans=0.1 2023-11-27 06:51:25,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3777760.0, ans=0.0 2023-11-27 06:51:29,865 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1550, loss[loss=0.06015, simple_loss=0.0883, pruned_loss=0.006851, audio_tagging_loss=0.009144, over 15748.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09047, pruned_loss=0.01209, audio_tagging_loss=0.008696, over 3046786.53 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:51:35,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3777826.6666666665, ans=0.125 2023-11-27 06:51:57,615 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566700 2023-11-27 06:51:57,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3777960.0, ans=0.0 2023-11-27 06:52:05,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3778026.6666666665, ans=0.0 2023-11-27 06:52:06,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3778026.6666666665, ans=0.95 2023-11-27 06:52:21,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2023-11-27 06:52:23,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3778093.3333333335, ans=0.0 2023-11-27 06:52:26,213 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1600, loss[loss=0.04876, simple_loss=0.0536, pruned_loss=0.007235, audio_tagging_loss=0.01472, over 14717.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08978, pruned_loss=0.01213, audio_tagging_loss=0.008817, over 3046534.48 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:52:33,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778160.0, ans=0.1 2023-11-27 06:52:39,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2023-11-27 06:52:40,791 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 9.269e+01 9.916e+01 1.064e+02 1.389e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-27 06:52:44,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3778226.6666666665, ans=0.125 2023-11-27 06:52:47,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3778226.6666666665, ans=0.125 2023-11-27 06:52:53,741 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566750 2023-11-27 06:52:57,715 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.53 vs. limit=10.0 2023-11-27 06:53:12,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3778426.6666666665, ans=0.0 2023-11-27 06:53:13,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3778426.6666666665, ans=0.125 2023-11-27 06:53:23,720 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1650, loss[loss=0.08403, simple_loss=0.1267, pruned_loss=0.01472, audio_tagging_loss=0.005972, over 15892.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09005, pruned_loss=0.01213, audio_tagging_loss=0.008822, over 3047061.69 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:53:26,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3778493.3333333335, ans=0.0 2023-11-27 06:53:27,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.35 vs. limit=10.0 2023-11-27 06:53:34,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=12.0 2023-11-27 06:53:50,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566800 2023-11-27 06:54:10,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.81 vs. limit=10.0 2023-11-27 06:54:14,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3778760.0, ans=0.95 2023-11-27 06:54:19,581 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1700, loss[loss=0.05779, simple_loss=0.07935, pruned_loss=0.008424, audio_tagging_loss=0.009694, over 15498.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08968, pruned_loss=0.01203, audio_tagging_loss=0.008878, over 3047342.33 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:54:21,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.65 vs. limit=6.0 2023-11-27 06:54:25,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3778826.6666666665, ans=0.0 2023-11-27 06:54:25,253 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:54:34,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.972e+01 9.443e+01 1.035e+02 1.327e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 06:54:43,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3778960.0, ans=0.2 2023-11-27 06:54:47,285 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566850 2023-11-27 06:54:52,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3779026.6666666665, ans=0.125 2023-11-27 06:54:57,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3779026.6666666665, ans=0.125 2023-11-27 06:54:58,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2023-11-27 06:55:07,630 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:55:15,424 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1750, loss[loss=0.08149, simple_loss=0.1195, pruned_loss=0.01414, audio_tagging_loss=0.007584, over 15498.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08953, pruned_loss=0.01198, audio_tagging_loss=0.008839, over 3051526.80 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:55:15,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3779160.0, ans=0.0 2023-11-27 06:55:17,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3779160.0, ans=0.125 2023-11-27 06:55:31,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3779226.6666666665, ans=0.0 2023-11-27 06:55:42,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566900 2023-11-27 06:55:48,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-11-27 06:56:08,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3779426.6666666665, ans=0.2 2023-11-27 06:56:09,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3779426.6666666665, ans=0.2 2023-11-27 06:56:12,157 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1800, loss[loss=0.06079, simple_loss=0.08093, pruned_loss=0.01129, audio_tagging_loss=0.009042, over 14486.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08972, pruned_loss=0.01204, audio_tagging_loss=0.008719, over 3043531.25 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:56:20,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3779493.3333333335, ans=0.0 2023-11-27 06:56:24,984 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2023-11-27 06:56:26,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 9.045e+01 9.662e+01 1.034e+02 1.361e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 06:56:38,861 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566950 2023-11-27 06:56:46,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3779693.3333333335, ans=0.0 2023-11-27 06:57:07,959 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1850, loss[loss=0.06925, simple_loss=0.09309, pruned_loss=0.0147, audio_tagging_loss=0.008002, over 15604.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.0895, pruned_loss=0.01219, audio_tagging_loss=0.008617, over 3045235.91 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:57:17,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3779893.3333333335, ans=0.125 2023-11-27 06:57:27,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3779893.3333333335, ans=0.2 2023-11-27 06:57:32,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3779960.0, ans=0.125 2023-11-27 06:57:34,700 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567000 2023-11-27 06:57:39,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=22.5 2023-11-27 06:57:40,999 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:57:50,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3780026.6666666665, ans=0.125 2023-11-27 06:58:04,540 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1900, loss[loss=0.05028, simple_loss=0.06381, pruned_loss=0.009863, audio_tagging_loss=0.008512, over 14840.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09011, pruned_loss=0.0122, audio_tagging_loss=0.00851, over 3042362.23 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:58:19,411 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.174e+01 9.812e+01 1.054e+02 1.527e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 06:58:19,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.66 vs. limit=22.5 2023-11-27 06:58:29,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3780293.3333333335, ans=0.0 2023-11-27 06:58:31,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567050 2023-11-27 06:58:53,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3780426.6666666665, ans=0.0 2023-11-27 06:59:00,542 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1950, loss[loss=0.06901, simple_loss=0.09531, pruned_loss=0.01256, audio_tagging_loss=0.008798, over 15878.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08964, pruned_loss=0.01209, audio_tagging_loss=0.008525, over 3052087.70 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:59:03,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3780493.3333333335, ans=0.0 2023-11-27 06:59:03,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3780493.3333333335, ans=0.1 2023-11-27 06:59:05,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3780493.3333333335, ans=0.07 2023-11-27 06:59:18,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3780560.0, ans=0.0 2023-11-27 06:59:22,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3780626.6666666665, ans=0.2 2023-11-27 06:59:23,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-27 06:59:24,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3780626.6666666665, ans=0.07 2023-11-27 06:59:27,380 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567100 2023-11-27 06:59:57,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-27 06:59:57,511 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2000, loss[loss=0.06978, simple_loss=0.101, pruned_loss=0.01068, audio_tagging_loss=0.008598, over 16323.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.0887, pruned_loss=0.01196, audio_tagging_loss=0.008544, over 3052962.83 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:00:09,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3780893.3333333335, ans=0.125 2023-11-27 07:00:10,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3780893.3333333335, ans=0.125 2023-11-27 07:00:11,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.474e+01 9.102e+01 9.766e+01 1.042e+02 1.467e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 07:00:13,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2023-11-27 07:00:22,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3780960.0, ans=0.125 2023-11-27 07:00:24,166 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567150 2023-11-27 07:00:25,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3780960.0, ans=0.0 2023-11-27 07:00:40,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3781026.6666666665, ans=0.125 2023-11-27 07:00:42,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3781093.3333333335, ans=0.125 2023-11-27 07:00:43,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3781093.3333333335, ans=0.125 2023-11-27 07:00:51,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3781093.3333333335, ans=0.05 2023-11-27 07:00:51,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=22.5 2023-11-27 07:00:52,954 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2050, loss[loss=0.06754, simple_loss=0.09188, pruned_loss=0.01384, audio_tagging_loss=0.007758, over 16032.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08884, pruned_loss=0.01203, audio_tagging_loss=0.008501, over 3045820.31 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:01:04,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3781226.6666666665, ans=0.1 2023-11-27 07:01:10,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3781226.6666666665, ans=0.125 2023-11-27 07:01:20,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567200 2023-11-27 07:01:29,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3781360.0, ans=0.1 2023-11-27 07:01:35,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3781360.0, ans=0.0 2023-11-27 07:01:38,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2023-11-27 07:01:49,453 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2100, loss[loss=0.07363, simple_loss=0.09601, pruned_loss=0.01523, audio_tagging_loss=0.01039, over 14319.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08958, pruned_loss=0.01215, audio_tagging_loss=0.008415, over 3047012.27 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:01:49,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3781493.3333333335, ans=0.0 2023-11-27 07:01:51,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3781493.3333333335, ans=0.125 2023-11-27 07:01:53,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781493.3333333335, ans=0.1 2023-11-27 07:02:02,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.84 vs. limit=10.0 2023-11-27 07:02:04,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.912e+01 9.473e+01 1.026e+02 1.468e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 07:02:16,289 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567250 2023-11-27 07:02:36,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781760.0, ans=0.1 2023-11-27 07:02:37,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3781760.0, ans=0.0 2023-11-27 07:02:38,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3781760.0, ans=0.125 2023-11-27 07:02:45,498 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2150, loss[loss=0.07743, simple_loss=0.1052, pruned_loss=0.01771, audio_tagging_loss=0.007137, over 15320.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08891, pruned_loss=0.01197, audio_tagging_loss=0.008391, over 3043920.47 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:03:04,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3781893.3333333335, ans=0.125 2023-11-27 07:03:12,856 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567300 2023-11-27 07:03:19,656 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:03:20,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-27 07:03:22,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=22.5 2023-11-27 07:03:37,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3782093.3333333335, ans=0.125 2023-11-27 07:03:41,244 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2200, loss[loss=0.09202, simple_loss=0.1281, pruned_loss=0.02034, audio_tagging_loss=0.007644, over 15348.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.0899, pruned_loss=0.01205, audio_tagging_loss=0.008436, over 3045591.11 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:03:55,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=12.0 2023-11-27 07:03:57,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.960e+01 9.671e+01 1.061e+02 1.263e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-27 07:03:58,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2023-11-27 07:04:08,506 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567350 2023-11-27 07:04:13,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3782293.3333333335, ans=0.125 2023-11-27 07:04:20,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.97 vs. limit=10.0 2023-11-27 07:04:37,763 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2250, loss[loss=0.08269, simple_loss=0.1217, pruned_loss=0.0145, audio_tagging_loss=0.007325, over 15320.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0897, pruned_loss=0.0121, audio_tagging_loss=0.008576, over 3042664.20 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:05:03,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3782626.6666666665, ans=0.0 2023-11-27 07:05:04,596 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567400 2023-11-27 07:05:12,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3782693.3333333335, ans=0.0 2023-11-27 07:05:25,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3782760.0, ans=0.025 2023-11-27 07:05:31,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3782760.0, ans=0.125 2023-11-27 07:05:34,185 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2300, loss[loss=0.04844, simple_loss=0.05815, pruned_loss=0.006599, audio_tagging_loss=0.01277, over 14936.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08908, pruned_loss=0.01196, audio_tagging_loss=0.008658, over 3038448.40 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:05:49,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.875e+01 9.360e+01 1.027e+02 1.398e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 07:05:49,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.58 vs. limit=15.0 2023-11-27 07:06:01,222 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567450 2023-11-27 07:06:01,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3782960.0, ans=0.0 2023-11-27 07:06:10,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3783026.6666666665, ans=0.2 2023-11-27 07:06:18,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3783093.3333333335, ans=0.0 2023-11-27 07:06:23,006 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:06:25,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-27 07:06:29,255 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2350, loss[loss=0.07029, simple_loss=0.1022, pruned_loss=0.01203, audio_tagging_loss=0.007161, over 14145.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08954, pruned_loss=0.01196, audio_tagging_loss=0.008662, over 3040037.41 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:06:31,396 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2023-11-27 07:06:34,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3783160.0, ans=10.0 2023-11-27 07:06:50,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3783226.6666666665, ans=0.125 2023-11-27 07:06:57,341 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567500 2023-11-27 07:07:20,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3783426.6666666665, ans=22.5 2023-11-27 07:07:26,530 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2400, loss[loss=0.05636, simple_loss=0.07596, pruned_loss=0.007827, audio_tagging_loss=0.01055, over 14571.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.0891, pruned_loss=0.01204, audio_tagging_loss=0.008853, over 3040706.96 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:07:42,064 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.975e+01 9.647e+01 1.056e+02 1.487e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 07:07:52,939 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567550 2023-11-27 07:08:05,047 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-27 07:08:15,929 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:08:22,604 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2450, loss[loss=0.07208, simple_loss=0.1019, pruned_loss=0.01334, audio_tagging_loss=0.007761, over 15518.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08989, pruned_loss=0.01212, audio_tagging_loss=0.008731, over 3048607.99 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:08:49,952 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567600 2023-11-27 07:08:51,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3783960.0, ans=0.125 2023-11-27 07:09:04,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.29 vs. limit=10.0 2023-11-27 07:09:15,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3784093.3333333335, ans=0.125 2023-11-27 07:09:18,671 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2500, loss[loss=0.05139, simple_loss=0.07307, pruned_loss=0.007162, audio_tagging_loss=0.007699, over 14316.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08981, pruned_loss=0.01212, audio_tagging_loss=0.008771, over 3046718.06 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:09:32,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3784226.6666666665, ans=0.125 2023-11-27 07:09:37,480 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.971e+01 9.601e+01 1.047e+02 1.603e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 07:09:42,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3784293.3333333335, ans=0.125 2023-11-27 07:09:46,614 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567650 2023-11-27 07:09:57,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3784360.0, ans=0.125 2023-11-27 07:10:16,020 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2550, loss[loss=0.05899, simple_loss=0.08086, pruned_loss=0.009291, audio_tagging_loss=0.009273, over 16204.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08934, pruned_loss=0.01198, audio_tagging_loss=0.008713, over 3049205.89 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:10:22,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3784493.3333333335, ans=0.2 2023-11-27 07:10:30,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3784560.0, ans=0.125 2023-11-27 07:10:34,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=12.0 2023-11-27 07:10:42,452 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567700 2023-11-27 07:11:04,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3784760.0, ans=0.07 2023-11-27 07:11:12,322 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2600, loss[loss=0.05715, simple_loss=0.07808, pruned_loss=0.01071, audio_tagging_loss=0.007399, over 14769.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08845, pruned_loss=0.01178, audio_tagging_loss=0.008557, over 3049848.60 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:11:29,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.922e+01 9.501e+01 1.026e+02 1.288e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 07:11:38,612 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567750 2023-11-27 07:11:59,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3785093.3333333335, ans=0.125 2023-11-27 07:12:07,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3785160.0, ans=0.0 2023-11-27 07:12:07,896 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2650, loss[loss=0.06457, simple_loss=0.08989, pruned_loss=0.01174, audio_tagging_loss=0.00789, over 16170.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08867, pruned_loss=0.01175, audio_tagging_loss=0.008422, over 3051989.60 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:12:12,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3785160.0, ans=0.125 2023-11-27 07:12:28,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3785226.6666666665, ans=0.0 2023-11-27 07:12:35,626 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567800 2023-11-27 07:12:38,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2023-11-27 07:12:42,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3785360.0, ans=0.0 2023-11-27 07:12:44,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-27 07:13:03,862 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2700, loss[loss=0.06989, simple_loss=0.1011, pruned_loss=0.01053, audio_tagging_loss=0.00881, over 16306.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.0891, pruned_loss=0.01193, audio_tagging_loss=0.008314, over 3052144.96 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:13:11,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3785493.3333333335, ans=0.2 2023-11-27 07:13:13,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3785493.3333333335, ans=0.125 2023-11-27 07:13:17,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3785560.0, ans=0.5 2023-11-27 07:13:19,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3785560.0, ans=0.125 2023-11-27 07:13:22,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.127e+01 9.631e+01 1.043e+02 1.339e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 07:13:29,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3785626.6666666665, ans=0.1 2023-11-27 07:13:31,270 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567850 2023-11-27 07:13:32,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3785626.6666666665, ans=0.125 2023-11-27 07:14:00,788 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2750, loss[loss=0.07604, simple_loss=0.1067, pruned_loss=0.01332, audio_tagging_loss=0.009369, over 15849.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08939, pruned_loss=0.01201, audio_tagging_loss=0.008257, over 3051139.79 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:14:07,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3785826.6666666665, ans=0.0 2023-11-27 07:14:15,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2023-11-27 07:14:26,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567900 2023-11-27 07:14:36,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3786026.6666666665, ans=0.0 2023-11-27 07:14:48,516 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:14:56,107 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2800, loss[loss=0.05535, simple_loss=0.07657, pruned_loss=0.00828, audio_tagging_loss=0.008781, over 14229.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08855, pruned_loss=0.0118, audio_tagging_loss=0.008398, over 3042283.97 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:15:04,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3786160.0, ans=0.125 2023-11-27 07:15:05,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=12.0 2023-11-27 07:15:05,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3786226.6666666665, ans=0.125 2023-11-27 07:15:09,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3786226.6666666665, ans=0.125 2023-11-27 07:15:13,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3786226.6666666665, ans=0.125 2023-11-27 07:15:14,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 9.123e+01 9.680e+01 1.057e+02 2.633e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-27 07:15:23,603 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567950 2023-11-27 07:15:25,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3786293.3333333335, ans=0.0 2023-11-27 07:15:31,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2023-11-27 07:15:40,543 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:15:52,497 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2850, loss[loss=0.04739, simple_loss=0.06002, pruned_loss=0.009589, audio_tagging_loss=0.007792, over 14235.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08849, pruned_loss=0.01174, audio_tagging_loss=0.008463, over 3038684.98 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:15:53,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3786493.3333333335, ans=0.2 2023-11-27 07:16:19,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568000 2023-11-27 07:16:50,518 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2900, loss[loss=0.07902, simple_loss=0.1063, pruned_loss=0.01874, audio_tagging_loss=0.007151, over 16281.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08812, pruned_loss=0.01166, audio_tagging_loss=0.008461, over 3050359.42 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:16:52,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3786826.6666666665, ans=0.0 2023-11-27 07:17:07,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 8.964e+01 9.454e+01 1.013e+02 1.177e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 07:17:16,010 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568050 2023-11-27 07:17:45,190 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2950, loss[loss=0.06446, simple_loss=0.08699, pruned_loss=0.01105, audio_tagging_loss=0.009921, over 15180.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08871, pruned_loss=0.01171, audio_tagging_loss=0.008479, over 3048037.02 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:18:11,721 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568100 2023-11-27 07:18:23,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3787360.0, ans=0.1 2023-11-27 07:18:24,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3787360.0, ans=0.0 2023-11-27 07:18:28,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787360.0, ans=0.1 2023-11-27 07:18:29,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3787426.6666666665, ans=0.95 2023-11-27 07:18:29,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3787426.6666666665, ans=0.125 2023-11-27 07:18:40,632 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3000, loss[loss=0.06636, simple_loss=0.09248, pruned_loss=0.01372, audio_tagging_loss=0.006403, over 14819.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.0902, pruned_loss=0.01209, audio_tagging_loss=0.008429, over 3049456.39 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:18:40,632 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 07:19:10,383 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0127, 4.0548, 4.9168, 4.4803], device='cuda:2') 2023-11-27 07:19:13,018 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05781, simple_loss=0.05047, pruned_loss=0.005352, audio_tagging_loss=0.02722, over 4681554.00 frames. 2023-11-27 07:19:13,019 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 07:19:29,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-27 07:19:31,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.941e+01 8.979e+01 9.616e+01 1.040e+02 1.231e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 07:19:39,380 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568150 2023-11-27 07:20:05,477 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:20:08,444 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3050, loss[loss=0.06846, simple_loss=0.1002, pruned_loss=0.01049, audio_tagging_loss=0.007861, over 15182.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09126, pruned_loss=0.01222, audio_tagging_loss=0.008515, over 3051167.40 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:20:35,629 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568200 2023-11-27 07:20:40,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3787960.0, ans=0.125 2023-11-27 07:20:41,567 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:20:53,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3788093.3333333335, ans=0.0 2023-11-27 07:20:56,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3788093.3333333335, ans=0.125 2023-11-27 07:21:04,509 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3100, loss[loss=0.08972, simple_loss=0.1295, pruned_loss=0.01824, audio_tagging_loss=0.006717, over 15682.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09108, pruned_loss=0.01232, audio_tagging_loss=0.008557, over 3047311.71 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:21:14,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788226.6666666665, ans=0.1 2023-11-27 07:21:24,096 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 9.299e+01 9.781e+01 1.036e+02 1.255e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-27 07:21:26,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3788293.3333333335, ans=0.0 2023-11-27 07:21:28,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2023-11-27 07:21:31,531 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568250 2023-11-27 07:21:57,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3788426.6666666665, ans=0.125 2023-11-27 07:22:00,753 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3150, loss[loss=0.0627, simple_loss=0.08598, pruned_loss=0.01087, audio_tagging_loss=0.008847, over 15581.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09124, pruned_loss=0.01222, audio_tagging_loss=0.00859, over 3049217.21 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:22:06,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3788493.3333333335, ans=0.0 2023-11-27 07:22:26,896 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568300 2023-11-27 07:22:45,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3788760.0, ans=0.1 2023-11-27 07:22:54,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3788760.0, ans=0.0 2023-11-27 07:22:56,757 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3200, loss[loss=0.06027, simple_loss=0.07597, pruned_loss=0.01128, audio_tagging_loss=0.011, over 15653.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.0902, pruned_loss=0.01206, audio_tagging_loss=0.008669, over 3048193.82 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:23:01,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3788826.6666666665, ans=0.04949747468305833 2023-11-27 07:23:12,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3788893.3333333335, ans=0.125 2023-11-27 07:23:15,178 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 9.115e+01 9.612e+01 1.036e+02 1.415e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 07:23:15,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-11-27 07:23:17,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3788893.3333333335, ans=0.125 2023-11-27 07:23:17,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788893.3333333335, ans=0.1 2023-11-27 07:23:18,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2023-11-27 07:23:22,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3788960.0, ans=0.1 2023-11-27 07:23:23,218 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568350 2023-11-27 07:23:40,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3789093.3333333335, ans=0.1 2023-11-27 07:23:43,411 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:23:46,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3789093.3333333335, ans=0.125 2023-11-27 07:23:51,841 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3250, loss[loss=0.07267, simple_loss=0.1101, pruned_loss=0.01283, audio_tagging_loss=0.004803, over 15270.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08969, pruned_loss=0.01195, audio_tagging_loss=0.00886, over 3052462.24 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:24:04,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3789226.6666666665, ans=0.1 2023-11-27 07:24:06,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3789226.6666666665, ans=0.1 2023-11-27 07:24:06,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3789226.6666666665, ans=0.125 2023-11-27 07:24:15,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=15.0 2023-11-27 07:24:19,670 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568400 2023-11-27 07:24:20,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3789293.3333333335, ans=0.0 2023-11-27 07:24:31,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3789360.0, ans=0.0 2023-11-27 07:24:37,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3789426.6666666665, ans=0.125 2023-11-27 07:24:48,703 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3300, loss[loss=0.07298, simple_loss=0.1048, pruned_loss=0.01309, audio_tagging_loss=0.007482, over 15082.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08928, pruned_loss=0.01187, audio_tagging_loss=0.008884, over 3052488.66 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:24:53,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3789493.3333333335, ans=0.1 2023-11-27 07:25:07,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 9.209e+01 1.006e+02 1.091e+02 1.432e+02, threshold=2.012e+02, percent-clipped=0.0 2023-11-27 07:25:14,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3789626.6666666665, ans=0.0 2023-11-27 07:25:15,508 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568450 2023-11-27 07:25:27,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3789693.3333333335, ans=0.0 2023-11-27 07:25:44,965 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3350, loss[loss=0.08177, simple_loss=0.111, pruned_loss=0.01759, audio_tagging_loss=0.008674, over 15147.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08931, pruned_loss=0.01184, audio_tagging_loss=0.008763, over 3053831.16 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:25:48,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3789826.6666666665, ans=0.0 2023-11-27 07:25:54,527 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2023-11-27 07:25:57,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-11-27 07:25:59,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3789893.3333333335, ans=0.125 2023-11-27 07:26:05,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3789893.3333333335, ans=0.0 2023-11-27 07:26:12,277 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568500 2023-11-27 07:26:18,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3790026.6666666665, ans=0.125 2023-11-27 07:26:34,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3790093.3333333335, ans=0.125 2023-11-27 07:26:40,915 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3400, loss[loss=0.06579, simple_loss=0.08836, pruned_loss=0.01387, audio_tagging_loss=0.007742, over 15766.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08915, pruned_loss=0.01185, audio_tagging_loss=0.008663, over 3053054.84 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:26:42,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2023-11-27 07:27:00,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 9.061e+01 9.680e+01 1.035e+02 1.293e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 07:27:08,318 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568550 2023-11-27 07:27:11,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3790293.3333333335, ans=0.0 2023-11-27 07:27:28,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3790426.6666666665, ans=0.0 2023-11-27 07:27:29,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-27 07:27:33,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790426.6666666665, ans=0.1 2023-11-27 07:27:37,468 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3450, loss[loss=0.05837, simple_loss=0.08259, pruned_loss=0.00901, audio_tagging_loss=0.008066, over 15350.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09007, pruned_loss=0.01199, audio_tagging_loss=0.008506, over 3047224.62 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:27:55,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3790560.0, ans=0.125 2023-11-27 07:28:00,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3790626.6666666665, ans=0.09899494936611666 2023-11-27 07:28:01,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3790626.6666666665, ans=0.125 2023-11-27 07:28:01,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2023-11-27 07:28:04,045 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568600 2023-11-27 07:28:18,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3790693.3333333335, ans=0.0 2023-11-27 07:28:29,866 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=12.0 2023-11-27 07:28:33,473 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3500, loss[loss=0.06332, simple_loss=0.08767, pruned_loss=0.01352, audio_tagging_loss=0.005962, over 14896.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08958, pruned_loss=0.01179, audio_tagging_loss=0.008447, over 3049918.18 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:28:52,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.089e+01 9.044e+01 9.629e+01 1.032e+02 1.263e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 07:29:00,472 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568650 2023-11-27 07:29:02,552 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:29:11,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3791026.6666666665, ans=0.2 2023-11-27 07:29:29,116 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3550, loss[loss=0.04857, simple_loss=0.06581, pruned_loss=0.005025, audio_tagging_loss=0.01064, over 14743.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08945, pruned_loss=0.01177, audio_tagging_loss=0.008453, over 3041614.35 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:29:30,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.73 vs. limit=12.0 2023-11-27 07:29:43,435 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2023-11-27 07:29:56,244 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568700 2023-11-27 07:30:21,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3791426.6666666665, ans=0.0 2023-11-27 07:30:25,371 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3600, loss[loss=0.05767, simple_loss=0.0825, pruned_loss=0.006244, audio_tagging_loss=0.01018, over 14832.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08909, pruned_loss=0.0119, audio_tagging_loss=0.008538, over 3044299.74 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:30:28,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3791493.3333333335, ans=0.125 2023-11-27 07:30:43,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.813e+01 9.473e+01 1.021e+02 1.241e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 07:30:51,200 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568750 2023-11-27 07:31:06,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3791693.3333333335, ans=0.125 2023-11-27 07:31:12,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2023-11-27 07:31:20,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3791826.6666666665, ans=0.0 2023-11-27 07:31:20,979 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3650, loss[loss=0.07033, simple_loss=0.09634, pruned_loss=0.01115, audio_tagging_loss=0.011, over 15585.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08903, pruned_loss=0.01196, audio_tagging_loss=0.008489, over 3045245.75 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:31:36,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3791893.3333333335, ans=0.0 2023-11-27 07:31:39,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3791893.3333333335, ans=0.0 2023-11-27 07:31:48,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568800 2023-11-27 07:31:58,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3792026.6666666665, ans=0.1 2023-11-27 07:32:06,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3792093.3333333335, ans=0.125 2023-11-27 07:32:07,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3792093.3333333335, ans=0.0 2023-11-27 07:32:16,489 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3700, loss[loss=0.04174, simple_loss=0.05249, pruned_loss=0.006355, audio_tagging_loss=0.009142, over 15011.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08985, pruned_loss=0.01205, audio_tagging_loss=0.008433, over 3047926.80 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:32:17,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3792160.0, ans=0.125 2023-11-27 07:32:37,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 9.068e+01 9.744e+01 1.049e+02 1.191e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 07:32:42,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3792293.3333333335, ans=22.5 2023-11-27 07:32:44,051 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568850 2023-11-27 07:32:44,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3792293.3333333335, ans=0.0 2023-11-27 07:33:13,046 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3750, loss[loss=0.04355, simple_loss=0.04916, pruned_loss=0.009476, audio_tagging_loss=0.009498, over 14556.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09036, pruned_loss=0.01208, audio_tagging_loss=0.008406, over 3055224.01 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:33:15,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3792493.3333333335, ans=0.125 2023-11-27 07:33:24,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3792560.0, ans=0.04949747468305833 2023-11-27 07:33:26,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3792560.0, ans=0.0 2023-11-27 07:33:39,741 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568900 2023-11-27 07:33:39,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3792626.6666666665, ans=0.0 2023-11-27 07:33:40,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3792626.6666666665, ans=0.0 2023-11-27 07:33:51,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3792693.3333333335, ans=0.1 2023-11-27 07:33:52,679 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:33:52,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3792693.3333333335, ans=0.125 2023-11-27 07:33:53,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2023-11-27 07:33:58,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3792760.0, ans=0.0 2023-11-27 07:34:02,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2023-11-27 07:34:02,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3792760.0, ans=0.5 2023-11-27 07:34:09,552 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3800, loss[loss=0.04984, simple_loss=0.07472, pruned_loss=0.005018, audio_tagging_loss=0.00746, over 14836.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09067, pruned_loss=0.01218, audio_tagging_loss=0.008507, over 3053256.55 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:34:27,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3792893.3333333335, ans=0.0 2023-11-27 07:34:28,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 9.318e+01 9.986e+01 1.074e+02 1.810e+02, threshold=1.997e+02, percent-clipped=0.0 2023-11-27 07:34:31,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3792960.0, ans=0.125 2023-11-27 07:34:35,960 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568950 2023-11-27 07:34:37,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3792960.0, ans=0.2 2023-11-27 07:34:39,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3792960.0, ans=0.125 2023-11-27 07:34:45,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3793026.6666666665, ans=0.125 2023-11-27 07:34:47,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3793026.6666666665, ans=0.125 2023-11-27 07:35:00,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3793093.3333333335, ans=0.1 2023-11-27 07:35:01,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3793093.3333333335, ans=0.125 2023-11-27 07:35:03,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3793160.0, ans=0.2 2023-11-27 07:35:04,402 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3850, loss[loss=0.04977, simple_loss=0.06669, pruned_loss=0.005971, audio_tagging_loss=0.01046, over 14032.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09065, pruned_loss=0.0121, audio_tagging_loss=0.008617, over 3048088.37 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:35:14,299 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=22.5 2023-11-27 07:35:32,208 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569000 2023-11-27 07:35:32,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-27 07:35:39,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3793360.0, ans=0.125 2023-11-27 07:36:00,489 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3900, loss[loss=0.06768, simple_loss=0.08776, pruned_loss=0.01415, audio_tagging_loss=0.009653, over 13441.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09007, pruned_loss=0.01208, audio_tagging_loss=0.008675, over 3043503.26 frames. ], batch size: 50, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:36:21,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2023-11-27 07:36:22,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.249e+01 9.619e+01 1.030e+02 1.289e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-27 07:36:27,694 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569050 2023-11-27 07:36:32,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3793626.6666666665, ans=0.125 2023-11-27 07:36:36,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2023-11-27 07:36:41,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3793693.3333333335, ans=0.125 2023-11-27 07:36:45,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3793760.0, ans=0.125 2023-11-27 07:36:56,874 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3950, loss[loss=0.08396, simple_loss=0.1113, pruned_loss=0.02113, audio_tagging_loss=0.007169, over 15282.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.0902, pruned_loss=0.0121, audio_tagging_loss=0.00872, over 3040110.81 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:37:07,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=22.5 2023-11-27 07:37:10,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2023-11-27 07:37:23,056 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569100 2023-11-27 07:37:28,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3794026.6666666665, ans=0.125 2023-11-27 07:37:41,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3794093.3333333335, ans=0.07 2023-11-27 07:37:52,206 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4000, loss[loss=0.07823, simple_loss=0.1018, pruned_loss=0.01766, audio_tagging_loss=0.009669, over 14950.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.0901, pruned_loss=0.0122, audio_tagging_loss=0.0088, over 3034850.27 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:37:53,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3794160.0, ans=0.0 2023-11-27 07:38:04,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-27 07:38:08,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3794226.6666666665, ans=0.125 2023-11-27 07:38:13,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 9.122e+01 1.001e+02 1.075e+02 1.362e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-27 07:38:17,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3794293.3333333335, ans=0.125 2023-11-27 07:38:20,073 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569150 2023-11-27 07:38:48,288 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4050, loss[loss=0.06629, simple_loss=0.09337, pruned_loss=0.01005, audio_tagging_loss=0.009557, over 14309.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08942, pruned_loss=0.01211, audio_tagging_loss=0.008838, over 3033106.39 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:38:48,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3794493.3333333335, ans=0.0 2023-11-27 07:38:51,493 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:38:56,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3794493.3333333335, ans=0.125 2023-11-27 07:39:03,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3794560.0, ans=0.0 2023-11-27 07:39:07,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3794560.0, ans=0.1 2023-11-27 07:39:15,557 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569200 2023-11-27 07:39:16,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3794626.6666666665, ans=0.95 2023-11-27 07:39:18,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3794626.6666666665, ans=0.1 2023-11-27 07:39:39,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-11-27 07:39:44,865 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4100, loss[loss=0.06427, simple_loss=0.08651, pruned_loss=0.01344, audio_tagging_loss=0.00758, over 14014.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08972, pruned_loss=0.01209, audio_tagging_loss=0.008752, over 3035831.99 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:39:51,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3794826.6666666665, ans=0.125 2023-11-27 07:39:54,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2023-11-27 07:40:02,615 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:40:05,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 9.057e+01 9.659e+01 1.032e+02 2.111e+02, threshold=1.932e+02, percent-clipped=1.0 2023-11-27 07:40:09,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3794960.0, ans=0.0 2023-11-27 07:40:11,107 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569250 2023-11-27 07:40:13,916 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:40:21,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3795026.6666666665, ans=0.0 2023-11-27 07:40:31,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3795093.3333333335, ans=0.2 2023-11-27 07:40:40,541 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4150, loss[loss=0.05738, simple_loss=0.08426, pruned_loss=0.008246, audio_tagging_loss=0.007004, over 14849.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.0898, pruned_loss=0.01221, audio_tagging_loss=0.008546, over 3028730.15 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:41:07,122 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569300 2023-11-27 07:41:19,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3795360.0, ans=0.0 2023-11-27 07:41:22,165 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:41:27,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3795426.6666666665, ans=0.125 2023-11-27 07:41:35,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3795493.3333333335, ans=0.2 2023-11-27 07:41:36,069 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4200, loss[loss=0.0654, simple_loss=0.08802, pruned_loss=0.01254, audio_tagging_loss=0.008856, over 15299.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08956, pruned_loss=0.0121, audio_tagging_loss=0.008442, over 3030031.05 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:41:49,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3795560.0, ans=0.125 2023-11-27 07:41:58,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 9.145e+01 9.853e+01 1.047e+02 1.662e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 07:41:59,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-27 07:42:00,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-27 07:42:02,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3795626.6666666665, ans=0.2 2023-11-27 07:42:03,847 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569350 2023-11-27 07:42:06,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3795626.6666666665, ans=0.1 2023-11-27 07:42:09,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3795693.3333333335, ans=0.2 2023-11-27 07:42:13,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3795693.3333333335, ans=0.0 2023-11-27 07:42:15,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2023-11-27 07:42:19,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-11-27 07:42:32,519 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4250, loss[loss=0.0634, simple_loss=0.08781, pruned_loss=0.01007, audio_tagging_loss=0.009429, over 15250.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08869, pruned_loss=0.01195, audio_tagging_loss=0.008434, over 3026600.38 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:42:44,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3795893.3333333335, ans=0.0 2023-11-27 07:42:47,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3795893.3333333335, ans=0.2 2023-11-27 07:42:59,157 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569400 2023-11-27 07:43:07,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3796026.6666666665, ans=0.125 2023-11-27 07:43:14,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-11-27 07:43:20,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3796093.3333333335, ans=0.0 2023-11-27 07:43:21,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3796093.3333333335, ans=0.0 2023-11-27 07:43:29,339 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4300, loss[loss=0.07512, simple_loss=0.1095, pruned_loss=0.01353, audio_tagging_loss=0.006853, over 15012.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08977, pruned_loss=0.01213, audio_tagging_loss=0.00835, over 3036102.53 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:43:33,760 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:43:33,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3796160.0, ans=0.1 2023-11-27 07:43:36,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3796160.0, ans=0.125 2023-11-27 07:43:49,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2023-11-27 07:43:50,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.134e+01 9.847e+01 1.057e+02 1.328e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 07:43:50,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3796293.3333333335, ans=0.0 2023-11-27 07:43:56,061 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569450 2023-11-27 07:44:19,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2023-11-27 07:44:24,883 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4350, loss[loss=0.05098, simple_loss=0.06207, pruned_loss=0.006921, audio_tagging_loss=0.01303, over 16193.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08895, pruned_loss=0.01183, audio_tagging_loss=0.008405, over 3040225.57 frames. ], batch size: 64, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:44:25,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=12.0 2023-11-27 07:44:45,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3796560.0, ans=0.2 2023-11-27 07:44:47,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3796626.6666666665, ans=0.0 2023-11-27 07:44:52,278 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569500 2023-11-27 07:44:52,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3796626.6666666665, ans=0.125 2023-11-27 07:44:53,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3796626.6666666665, ans=0.125 2023-11-27 07:44:59,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3796693.3333333335, ans=0.125 2023-11-27 07:45:02,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3796693.3333333335, ans=22.5 2023-11-27 07:45:11,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3796760.0, ans=0.0 2023-11-27 07:45:20,923 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4400, loss[loss=0.06625, simple_loss=0.09753, pruned_loss=0.0103, audio_tagging_loss=0.007185, over 15638.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08941, pruned_loss=0.01184, audio_tagging_loss=0.008452, over 3040013.60 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:45:21,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3796826.6666666665, ans=0.04949747468305833 2023-11-27 07:45:42,614 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.963e+01 9.534e+01 1.011e+02 1.280e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 07:45:48,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569550 2023-11-27 07:45:50,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796960.0, ans=0.1 2023-11-27 07:45:51,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3796960.0, ans=0.125 2023-11-27 07:46:00,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3797026.6666666665, ans=0.1 2023-11-27 07:46:04,270 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-27 07:46:17,083 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4450, loss[loss=0.1041, simple_loss=0.15, pruned_loss=0.02144, audio_tagging_loss=0.007663, over 17276.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.0905, pruned_loss=0.01203, audio_tagging_loss=0.008346, over 3045318.45 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:46:33,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3797226.6666666665, ans=0.125 2023-11-27 07:46:39,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3797293.3333333335, ans=0.125 2023-11-27 07:46:43,684 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569600 2023-11-27 07:46:47,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-27 07:46:58,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3797360.0, ans=0.0 2023-11-27 07:47:07,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3797426.6666666665, ans=0.125 2023-11-27 07:47:12,974 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4500, loss[loss=0.07, simple_loss=0.09635, pruned_loss=0.01349, audio_tagging_loss=0.008333, over 14803.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09077, pruned_loss=0.01201, audio_tagging_loss=0.008314, over 3047464.96 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:47:15,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3797493.3333333335, ans=0.2 2023-11-27 07:47:15,304 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:47:35,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.963e+01 9.073e+01 9.744e+01 1.047e+02 1.558e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 07:47:40,051 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569650 2023-11-27 07:47:40,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797626.6666666665, ans=0.1 2023-11-27 07:48:08,488 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4550, loss[loss=0.06962, simple_loss=0.09667, pruned_loss=0.01224, audio_tagging_loss=0.00904, over 13913.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09123, pruned_loss=0.01214, audio_tagging_loss=0.008338, over 3043546.91 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:48:16,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3797826.6666666665, ans=0.125 2023-11-27 07:48:35,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569700 2023-11-27 07:48:35,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3797960.0, ans=0.125 2023-11-27 07:48:35,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3797960.0, ans=0.125 2023-11-27 07:48:43,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3798026.6666666665, ans=0.07 2023-11-27 07:48:46,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3798026.6666666665, ans=0.125 2023-11-27 07:48:48,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3798026.6666666665, ans=0.125 2023-11-27 07:48:52,300 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:49:02,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3798093.3333333335, ans=0.125 2023-11-27 07:49:05,193 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4600, loss[loss=0.06152, simple_loss=0.07587, pruned_loss=0.01253, audio_tagging_loss=0.01106, over 14102.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09125, pruned_loss=0.01226, audio_tagging_loss=0.008415, over 3040731.39 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:49:25,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3798226.6666666665, ans=0.0 2023-11-27 07:49:27,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.803e+01 9.390e+01 1.011e+02 1.144e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 07:49:32,015 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569750 2023-11-27 07:49:54,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3798426.6666666665, ans=0.1 2023-11-27 07:50:00,868 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4650, loss[loss=0.07341, simple_loss=0.1048, pruned_loss=0.01516, audio_tagging_loss=0.005848, over 15057.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09176, pruned_loss=0.01218, audio_tagging_loss=0.008428, over 3050162.36 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:50:06,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3798493.3333333335, ans=0.125 2023-11-27 07:50:08,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3798493.3333333335, ans=0.125 2023-11-27 07:50:17,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3798560.0, ans=0.125 2023-11-27 07:50:28,632 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569800 2023-11-27 07:50:38,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3798693.3333333335, ans=0.1 2023-11-27 07:50:44,073 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:50:52,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3798760.0, ans=0.125 2023-11-27 07:50:56,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3798826.6666666665, ans=0.125 2023-11-27 07:50:57,440 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4700, loss[loss=0.05222, simple_loss=0.07565, pruned_loss=0.006814, audio_tagging_loss=0.007583, over 14891.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09195, pruned_loss=0.01218, audio_tagging_loss=0.00858, over 3049947.07 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:51:05,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.01 vs. limit=10.0 2023-11-27 07:51:07,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3798826.6666666665, ans=0.1 2023-11-27 07:51:10,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3798893.3333333335, ans=0.125 2023-11-27 07:51:19,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.828e+01 9.714e+01 1.039e+02 1.424e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 07:51:23,961 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569850 2023-11-27 07:51:33,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3799026.6666666665, ans=0.125 2023-11-27 07:51:48,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-27 07:51:53,639 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4750, loss[loss=0.07369, simple_loss=0.09838, pruned_loss=0.01456, audio_tagging_loss=0.009942, over 15866.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09047, pruned_loss=0.01195, audio_tagging_loss=0.008718, over 3049177.56 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:52:01,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3799160.0, ans=0.125 2023-11-27 07:52:01,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3799160.0, ans=0.0 2023-11-27 07:52:05,644 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:52:14,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3799293.3333333335, ans=0.1 2023-11-27 07:52:20,337 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569900 2023-11-27 07:52:30,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3799360.0, ans=0.125 2023-11-27 07:52:36,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3799360.0, ans=0.09899494936611666 2023-11-27 07:52:40,660 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:52:47,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3799493.3333333335, ans=0.0 2023-11-27 07:52:48,852 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4800, loss[loss=0.04918, simple_loss=0.06373, pruned_loss=0.008036, audio_tagging_loss=0.009276, over 15083.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08982, pruned_loss=0.01192, audio_tagging_loss=0.008829, over 3055204.57 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:53:07,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=12.0 2023-11-27 07:53:11,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 9.043e+01 9.728e+01 1.036e+02 1.523e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 07:53:16,496 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569950 2023-11-27 07:53:32,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3799760.0, ans=0.125 2023-11-27 07:53:45,019 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4850, loss[loss=0.04211, simple_loss=0.05442, pruned_loss=0.003889, audio_tagging_loss=0.01101, over 15508.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08951, pruned_loss=0.01198, audio_tagging_loss=0.008949, over 3057979.32 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:53:50,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3799826.6666666665, ans=0.125 2023-11-27 07:54:02,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-27 07:54:11,711 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570000 2023-11-27 07:54:27,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3800026.6666666665, ans=0.0 2023-11-27 07:54:32,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3800093.3333333335, ans=0.2 2023-11-27 07:54:41,096 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4900, loss[loss=0.06322, simple_loss=0.08878, pruned_loss=0.01133, audio_tagging_loss=0.0075, over 14872.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08891, pruned_loss=0.01206, audio_tagging_loss=0.008963, over 3047018.51 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:54:46,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3800160.0, ans=0.0 2023-11-27 07:55:02,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 9.110e+01 9.715e+01 1.029e+02 1.331e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 07:55:06,616 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570050 2023-11-27 07:55:08,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3800293.3333333335, ans=0.2 2023-11-27 07:55:14,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3800360.0, ans=0.125 2023-11-27 07:55:20,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3800360.0, ans=0.04949747468305833 2023-11-27 07:55:36,375 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4950, loss[loss=0.07021, simple_loss=0.1005, pruned_loss=0.01166, audio_tagging_loss=0.008311, over 15837.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08923, pruned_loss=0.01208, audio_tagging_loss=0.008808, over 3046696.95 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:55:41,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3800493.3333333335, ans=0.025 2023-11-27 07:55:49,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3800560.0, ans=0.0 2023-11-27 07:56:04,236 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570100 2023-11-27 07:56:07,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3800626.6666666665, ans=0.05 2023-11-27 07:56:31,920 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5000, loss[loss=0.05827, simple_loss=0.07802, pruned_loss=0.01195, audio_tagging_loss=0.007307, over 14675.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08856, pruned_loss=0.01208, audio_tagging_loss=0.00868, over 3046108.95 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:56:50,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3800893.3333333335, ans=0.2 2023-11-27 07:56:51,251 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:56:53,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3800893.3333333335, ans=0.0 2023-11-27 07:56:54,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3800960.0, ans=0.125 2023-11-27 07:56:55,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.884e+01 9.444e+01 1.042e+02 1.203e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 07:56:55,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3800960.0, ans=0.125 2023-11-27 07:56:59,626 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570150 2023-11-27 07:57:08,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3801026.6666666665, ans=0.125 2023-11-27 07:57:21,115 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:57:23,730 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2023-11-27 07:57:29,492 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5050, loss[loss=0.05876, simple_loss=0.07496, pruned_loss=0.01272, audio_tagging_loss=0.008558, over 15348.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08895, pruned_loss=0.01201, audio_tagging_loss=0.008579, over 3042222.56 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:57:38,399 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-11-27 07:57:42,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3801226.6666666665, ans=0.04949747468305833 2023-11-27 07:57:42,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-11-27 07:57:52,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3801293.3333333335, ans=0.05 2023-11-27 07:57:52,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3801293.3333333335, ans=0.1 2023-11-27 07:57:54,973 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570200 2023-11-27 07:58:02,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3801360.0, ans=0.0 2023-11-27 07:58:18,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3801426.6666666665, ans=0.0 2023-11-27 07:58:24,946 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5100, loss[loss=0.07296, simple_loss=0.09665, pruned_loss=0.01431, audio_tagging_loss=0.01032, over 16826.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08883, pruned_loss=0.012, audio_tagging_loss=0.008519, over 3040902.47 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:58:26,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3801493.3333333335, ans=0.0 2023-11-27 07:58:31,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3801493.3333333335, ans=0.09899494936611666 2023-11-27 07:58:32,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3801493.3333333335, ans=0.125 2023-11-27 07:58:35,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3801560.0, ans=0.125 2023-11-27 07:58:42,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-11-27 07:58:46,397 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.907e+01 8.861e+01 9.486e+01 1.041e+02 1.352e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 07:58:51,779 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570250 2023-11-27 07:58:56,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3801626.6666666665, ans=0.0 2023-11-27 07:59:00,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2023-11-27 07:59:02,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3801693.3333333335, ans=0.2 2023-11-27 07:59:05,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3801693.3333333335, ans=0.015 2023-11-27 07:59:12,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3801760.0, ans=0.125 2023-11-27 07:59:19,740 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5150, loss[loss=0.05139, simple_loss=0.06103, pruned_loss=0.0105, audio_tagging_loss=0.01038, over 14859.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08847, pruned_loss=0.01177, audio_tagging_loss=0.008516, over 3041682.27 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:59:21,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-27 07:59:26,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-27 07:59:46,783 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570300 2023-11-27 07:59:54,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3802026.6666666665, ans=0.125 2023-11-27 08:00:12,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2023-11-27 08:00:15,511 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5200, loss[loss=0.05252, simple_loss=0.07087, pruned_loss=0.007885, audio_tagging_loss=0.009198, over 15337.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08916, pruned_loss=0.01184, audio_tagging_loss=0.008436, over 3042229.95 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 08:00:30,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3802226.6666666665, ans=0.125 2023-11-27 08:00:39,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 9.152e+01 9.640e+01 1.026e+02 1.239e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 08:00:42,060 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570350 2023-11-27 08:00:44,378 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:00:45,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3802293.3333333335, ans=0.0 2023-11-27 08:01:07,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3802426.6666666665, ans=0.1 2023-11-27 08:01:11,803 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5250, loss[loss=0.04949, simple_loss=0.06565, pruned_loss=0.006069, audio_tagging_loss=0.0106, over 14756.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08941, pruned_loss=0.01191, audio_tagging_loss=0.008341, over 3041003.18 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:01:24,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3802560.0, ans=0.125 2023-11-27 08:01:38,159 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570400 2023-11-27 08:01:41,435 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-11-27 08:01:51,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3802693.3333333335, ans=0.1 2023-11-27 08:02:01,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3802760.0, ans=0.0 2023-11-27 08:02:05,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3802760.0, ans=0.0 2023-11-27 08:02:07,492 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5300, loss[loss=0.06284, simple_loss=0.0941, pruned_loss=0.00922, audio_tagging_loss=0.006571, over 15638.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08966, pruned_loss=0.01182, audio_tagging_loss=0.008223, over 3041231.80 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:02:12,316 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-11-27 08:02:13,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3802826.6666666665, ans=0.09899494936611666 2023-11-27 08:02:18,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3802893.3333333335, ans=0.2 2023-11-27 08:02:26,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3802893.3333333335, ans=0.0 2023-11-27 08:02:29,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=15.0 2023-11-27 08:02:33,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 9.130e+01 9.779e+01 1.044e+02 2.518e+02, threshold=1.956e+02, percent-clipped=1.0 2023-11-27 08:02:35,422 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570450 2023-11-27 08:02:37,674 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:02:40,870 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:02:41,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3803026.6666666665, ans=0.125 2023-11-27 08:02:42,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3803026.6666666665, ans=0.0 2023-11-27 08:03:03,252 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5350, loss[loss=0.05676, simple_loss=0.07235, pruned_loss=0.009671, audio_tagging_loss=0.01092, over 15505.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08909, pruned_loss=0.01171, audio_tagging_loss=0.00834, over 3040684.57 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:03:05,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3803160.0, ans=0.0 2023-11-27 08:03:15,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3803226.6666666665, ans=0.125 2023-11-27 08:03:18,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2023-11-27 08:03:25,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3803293.3333333335, ans=0.07 2023-11-27 08:03:30,608 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570500 2023-11-27 08:03:52,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3803426.6666666665, ans=0.1 2023-11-27 08:03:56,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=22.5 2023-11-27 08:03:56,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3803426.6666666665, ans=0.125 2023-11-27 08:04:00,236 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5400, loss[loss=0.06755, simple_loss=0.09328, pruned_loss=0.01143, audio_tagging_loss=0.009476, over 14860.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.09012, pruned_loss=0.0119, audio_tagging_loss=0.008347, over 3039093.02 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:04:04,717 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:04:09,352 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.46 vs. limit=10.0 2023-11-27 08:04:25,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.928e+01 9.462e+01 1.035e+02 1.260e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 08:04:26,240 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570550 2023-11-27 08:04:30,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3803626.6666666665, ans=0.125 2023-11-27 08:04:37,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-27 08:04:55,116 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5450, loss[loss=0.06442, simple_loss=0.08169, pruned_loss=0.01809, audio_tagging_loss=0.005488, over 13668.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.09032, pruned_loss=0.01185, audio_tagging_loss=0.008375, over 3041024.90 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:05:00,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-27 08:05:00,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3803826.6666666665, ans=0.125 2023-11-27 08:05:22,194 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570600 2023-11-27 08:05:45,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3804093.3333333335, ans=0.0 2023-11-27 08:05:51,032 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5500, loss[loss=0.06692, simple_loss=0.08501, pruned_loss=0.01247, audio_tagging_loss=0.01195, over 15801.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.09001, pruned_loss=0.01194, audio_tagging_loss=0.008459, over 3042283.92 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:05:52,396 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:06:16,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.180e+01 9.726e+01 1.043e+02 1.311e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 08:06:18,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570650 2023-11-27 08:06:23,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3804360.0, ans=0.2 2023-11-27 08:06:37,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3804426.6666666665, ans=0.125 2023-11-27 08:06:47,606 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5550, loss[loss=0.06474, simple_loss=0.08571, pruned_loss=0.01348, audio_tagging_loss=0.008408, over 16095.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09, pruned_loss=0.01208, audio_tagging_loss=0.008539, over 3040637.99 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:07:09,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2023-11-27 08:07:11,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3804626.6666666665, ans=0.125 2023-11-27 08:07:14,273 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570700 2023-11-27 08:07:21,394 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:07:41,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3804760.0, ans=0.2 2023-11-27 08:07:43,565 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5600, loss[loss=0.06877, simple_loss=0.1005, pruned_loss=0.01122, audio_tagging_loss=0.007301, over 14995.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09031, pruned_loss=0.01196, audio_tagging_loss=0.00864, over 3042106.03 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:07:50,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3804826.6666666665, ans=0.95 2023-11-27 08:07:50,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3804826.6666666665, ans=0.125 2023-11-27 08:08:07,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2023-11-27 08:08:10,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.878e+01 8.987e+01 9.756e+01 1.044e+02 1.605e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 08:08:10,634 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570750 2023-11-27 08:08:16,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3805026.6666666665, ans=0.0 2023-11-27 08:08:23,706 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:08:30,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-27 08:08:32,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-27 08:08:33,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3805093.3333333335, ans=0.2 2023-11-27 08:08:36,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-27 08:08:39,232 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5650, loss[loss=0.08064, simple_loss=0.1107, pruned_loss=0.01922, audio_tagging_loss=0.006064, over 14667.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08976, pruned_loss=0.01193, audio_tagging_loss=0.00882, over 3047784.01 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:08:42,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3805160.0, ans=0.125 2023-11-27 08:08:46,674 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2023-11-27 08:08:49,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3805160.0, ans=0.125 2023-11-27 08:09:06,191 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570800 2023-11-27 08:09:10,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3805293.3333333335, ans=0.0 2023-11-27 08:09:15,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-27 08:09:34,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-27 08:09:35,528 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5700, loss[loss=0.07534, simple_loss=0.1051, pruned_loss=0.01441, audio_tagging_loss=0.008396, over 16158.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08992, pruned_loss=0.01196, audio_tagging_loss=0.008851, over 3045799.65 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:09:52,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3805560.0, ans=0.125 2023-11-27 08:09:59,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3805626.6666666665, ans=0.0 2023-11-27 08:09:59,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3805626.6666666665, ans=0.125 2023-11-27 08:10:01,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.888e+01 9.534e+01 1.037e+02 1.369e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 08:10:01,871 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570850 2023-11-27 08:10:04,469 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-27 08:10:05,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2023-11-27 08:10:30,868 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5750, loss[loss=0.06775, simple_loss=0.09784, pruned_loss=0.01013, audio_tagging_loss=0.008701, over 15238.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08952, pruned_loss=0.01193, audio_tagging_loss=0.008712, over 3054208.09 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:10:32,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3805826.6666666665, ans=0.125 2023-11-27 08:10:37,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3805826.6666666665, ans=0.1 2023-11-27 08:10:41,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3805893.3333333335, ans=0.05 2023-11-27 08:10:58,278 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570900 2023-11-27 08:11:09,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3806026.6666666665, ans=0.04949747468305833 2023-11-27 08:11:27,062 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5800, loss[loss=0.06365, simple_loss=0.09014, pruned_loss=0.01282, audio_tagging_loss=0.005764, over 14386.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08959, pruned_loss=0.0121, audio_tagging_loss=0.008627, over 3046093.75 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:11:28,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3806160.0, ans=0.125 2023-11-27 08:11:52,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3806293.3333333335, ans=0.125 2023-11-27 08:11:53,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 9.206e+01 9.616e+01 1.021e+02 1.551e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 08:11:53,815 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570950 2023-11-27 08:11:59,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3806360.0, ans=0.2 2023-11-27 08:12:15,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3806426.6666666665, ans=0.2 2023-11-27 08:12:16,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3806426.6666666665, ans=0.125 2023-11-27 08:12:23,485 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5850, loss[loss=0.06499, simple_loss=0.08697, pruned_loss=0.01153, audio_tagging_loss=0.009974, over 14722.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08867, pruned_loss=0.01186, audio_tagging_loss=0.008509, over 3044706.65 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:12:30,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3806493.3333333335, ans=0.025 2023-11-27 08:12:49,871 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571000 2023-11-27 08:12:54,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3806626.6666666665, ans=0.125 2023-11-27 08:13:18,638 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5900, loss[loss=0.07665, simple_loss=0.1063, pruned_loss=0.01654, audio_tagging_loss=0.006968, over 14453.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08856, pruned_loss=0.01191, audio_tagging_loss=0.008452, over 3041540.10 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:13:23,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-11-27 08:13:34,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3806893.3333333335, ans=0.1 2023-11-27 08:13:36,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3806893.3333333335, ans=0.125 2023-11-27 08:13:37,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3806893.3333333335, ans=0.125 2023-11-27 08:13:42,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3806960.0, ans=0.125 2023-11-27 08:13:45,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.203e+01 9.720e+01 1.067e+02 1.821e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 08:13:45,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571050 2023-11-27 08:13:51,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3807026.6666666665, ans=0.035 2023-11-27 08:13:52,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3807026.6666666665, ans=0.0 2023-11-27 08:13:53,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3807026.6666666665, ans=0.125 2023-11-27 08:13:56,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-27 08:13:58,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3807026.6666666665, ans=0.0 2023-11-27 08:14:14,862 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5950, loss[loss=0.04204, simple_loss=0.05567, pruned_loss=0.007282, audio_tagging_loss=0.006923, over 15504.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08934, pruned_loss=0.01185, audio_tagging_loss=0.008371, over 3048224.76 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:14:15,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-11-27 08:14:22,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3807160.0, ans=0.125 2023-11-27 08:14:40,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3807293.3333333335, ans=0.0 2023-11-27 08:14:41,373 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571100 2023-11-27 08:14:53,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2023-11-27 08:14:56,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3807360.0, ans=0.0 2023-11-27 08:15:01,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3807426.6666666665, ans=0.125 2023-11-27 08:15:10,301 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6000, loss[loss=0.08382, simple_loss=0.1205, pruned_loss=0.01794, audio_tagging_loss=0.005645, over 16018.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.0895, pruned_loss=0.01197, audio_tagging_loss=0.00834, over 3046970.26 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:15:10,301 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 08:15:30,772 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4767, 3.8510, 3.0611, 3.8677], device='cuda:2') 2023-11-27 08:15:42,633 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05815, simple_loss=0.05046, pruned_loss=0.005371, audio_tagging_loss=0.02755, over 4681554.00 frames. 2023-11-27 08:15:42,634 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 08:15:48,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3807493.3333333335, ans=0.125 2023-11-27 08:15:50,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3807493.3333333335, ans=0.0 2023-11-27 08:15:50,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3807493.3333333335, ans=0.125 2023-11-27 08:15:50,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3807493.3333333335, ans=0.0 2023-11-27 08:15:58,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3807560.0, ans=0.0 2023-11-27 08:15:59,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-11-27 08:16:04,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2023-11-27 08:16:10,245 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.889e+01 9.644e+01 1.039e+02 1.494e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 08:16:10,339 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571150 2023-11-27 08:16:14,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3807626.6666666665, ans=0.125 2023-11-27 08:16:20,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3807693.3333333335, ans=0.1 2023-11-27 08:16:23,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3807693.3333333335, ans=0.125 2023-11-27 08:16:24,065 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:16:24,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3807693.3333333335, ans=0.125 2023-11-27 08:16:28,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3807760.0, ans=0.0 2023-11-27 08:16:30,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3807760.0, ans=0.0 2023-11-27 08:16:39,038 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6050, loss[loss=0.07407, simple_loss=0.099, pruned_loss=0.01372, audio_tagging_loss=0.01086, over 14794.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08855, pruned_loss=0.01186, audio_tagging_loss=0.008461, over 3048171.76 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:16:44,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3807826.6666666665, ans=0.2 2023-11-27 08:16:48,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3807826.6666666665, ans=15.0 2023-11-27 08:17:05,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571200 2023-11-27 08:17:07,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2023-11-27 08:17:35,804 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6100, loss[loss=0.07476, simple_loss=0.1047, pruned_loss=0.01695, audio_tagging_loss=0.005465, over 15022.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08853, pruned_loss=0.01184, audio_tagging_loss=0.008444, over 3059902.88 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:17:43,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3808160.0, ans=0.125 2023-11-27 08:17:44,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3808160.0, ans=0.0 2023-11-27 08:17:46,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.14 vs. limit=10.0 2023-11-27 08:17:50,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3808226.6666666665, ans=0.1 2023-11-27 08:17:52,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3808226.6666666665, ans=0.0 2023-11-27 08:18:01,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.030e+01 9.632e+01 1.027e+02 1.334e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 08:18:01,917 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571250 2023-11-27 08:18:06,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.04 vs. limit=10.0 2023-11-27 08:18:07,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3808360.0, ans=0.125 2023-11-27 08:18:21,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3808426.6666666665, ans=0.125 2023-11-27 08:18:25,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3808426.6666666665, ans=10.0 2023-11-27 08:18:31,176 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6150, loss[loss=0.06278, simple_loss=0.08797, pruned_loss=0.01058, audio_tagging_loss=0.008213, over 16042.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08826, pruned_loss=0.0119, audio_tagging_loss=0.008561, over 3056587.00 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:18:53,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3808626.6666666665, ans=0.125 2023-11-27 08:18:54,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2023-11-27 08:18:58,685 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571300 2023-11-27 08:19:26,889 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6200, loss[loss=0.0671, simple_loss=0.08833, pruned_loss=0.01329, audio_tagging_loss=0.009646, over 15884.00 frames. ], tot_loss[loss=0.06379, simple_loss=0.0872, pruned_loss=0.0116, audio_tagging_loss=0.008587, over 3049988.68 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:19:34,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3808826.6666666665, ans=0.0 2023-11-27 08:19:44,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3808893.3333333335, ans=0.0 2023-11-27 08:19:44,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3808893.3333333335, ans=0.1 2023-11-27 08:19:53,786 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.915e+01 9.429e+01 1.009e+02 1.347e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 08:19:53,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571350 2023-11-27 08:20:00,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3809026.6666666665, ans=0.0 2023-11-27 08:20:23,685 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6250, loss[loss=0.06264, simple_loss=0.08621, pruned_loss=0.01147, audio_tagging_loss=0.008073, over 15397.00 frames. ], tot_loss[loss=0.06368, simple_loss=0.08696, pruned_loss=0.01149, audio_tagging_loss=0.008712, over 3053475.15 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:20:30,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3809160.0, ans=0.2 2023-11-27 08:20:49,667 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571400 2023-11-27 08:20:54,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3809293.3333333335, ans=0.125 2023-11-27 08:21:10,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3809426.6666666665, ans=0.0 2023-11-27 08:21:17,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3809426.6666666665, ans=0.1 2023-11-27 08:21:19,183 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6300, loss[loss=0.05586, simple_loss=0.07907, pruned_loss=0.006943, audio_tagging_loss=0.009377, over 15528.00 frames. ], tot_loss[loss=0.06387, simple_loss=0.08727, pruned_loss=0.01148, audio_tagging_loss=0.008752, over 3048257.12 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:21:40,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3809626.6666666665, ans=0.125 2023-11-27 08:21:46,279 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.820e+01 9.366e+01 1.014e+02 1.355e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 08:21:46,380 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571450 2023-11-27 08:22:15,173 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6350, loss[loss=0.07457, simple_loss=0.09785, pruned_loss=0.0159, audio_tagging_loss=0.009743, over 16088.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08824, pruned_loss=0.01166, audio_tagging_loss=0.008817, over 3049120.07 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:22:41,860 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571500 2023-11-27 08:23:11,271 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6400, loss[loss=0.0642, simple_loss=0.08907, pruned_loss=0.01115, audio_tagging_loss=0.008513, over 14281.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08765, pruned_loss=0.01161, audio_tagging_loss=0.009032, over 3043551.42 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:23:28,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2023-11-27 08:23:37,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571550 2023-11-27 08:23:39,071 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.870e+01 9.357e+01 1.034e+02 1.188e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 08:23:42,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3810293.3333333335, ans=0.0 2023-11-27 08:24:00,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3810426.6666666665, ans=0.05 2023-11-27 08:24:00,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3810426.6666666665, ans=0.125 2023-11-27 08:24:03,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3810426.6666666665, ans=0.0 2023-11-27 08:24:07,243 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6450, loss[loss=0.07076, simple_loss=0.1023, pruned_loss=0.01127, audio_tagging_loss=0.008368, over 15061.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.0881, pruned_loss=0.01177, audio_tagging_loss=0.009002, over 3038490.40 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:24:11,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3810493.3333333335, ans=0.035 2023-11-27 08:24:21,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3810560.0, ans=0.09899494936611666 2023-11-27 08:24:33,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571600 2023-11-27 08:24:46,242 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=12.0 2023-11-27 08:25:02,675 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6500, loss[loss=0.06449, simple_loss=0.0904, pruned_loss=0.01226, audio_tagging_loss=0.007033, over 16140.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.0877, pruned_loss=0.01173, audio_tagging_loss=0.008934, over 3039310.69 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:25:15,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3810893.3333333335, ans=0.0 2023-11-27 08:25:19,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3810893.3333333335, ans=0.125 2023-11-27 08:25:30,474 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571650 2023-11-27 08:25:31,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.956e+01 9.682e+01 1.036e+02 1.299e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 08:25:42,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-27 08:25:54,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=12.0 2023-11-27 08:25:58,444 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6550, loss[loss=0.06931, simple_loss=0.09391, pruned_loss=0.01341, audio_tagging_loss=0.008947, over 15081.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08791, pruned_loss=0.01194, audio_tagging_loss=0.008885, over 3038918.79 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:26:03,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3811160.0, ans=0.125 2023-11-27 08:26:07,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3811160.0, ans=0.1 2023-11-27 08:26:09,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3811226.6666666665, ans=0.0 2023-11-27 08:26:23,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-11-27 08:26:24,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3811293.3333333335, ans=0.0 2023-11-27 08:26:25,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571700 2023-11-27 08:26:27,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3811293.3333333335, ans=0.1 2023-11-27 08:26:51,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3811426.6666666665, ans=0.0 2023-11-27 08:26:55,412 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6600, loss[loss=0.06704, simple_loss=0.1026, pruned_loss=0.01084, audio_tagging_loss=0.004924, over 16128.00 frames. ], tot_loss[loss=0.0639, simple_loss=0.08703, pruned_loss=0.01157, audio_tagging_loss=0.008805, over 3038463.90 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:27:07,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3811560.0, ans=0.1 2023-11-27 08:27:21,298 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571750 2023-11-27 08:27:23,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.192e+01 9.078e+01 9.642e+01 1.016e+02 1.265e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 08:27:29,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-27 08:27:32,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-27 08:27:37,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3811693.3333333335, ans=0.125 2023-11-27 08:27:40,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3811760.0, ans=0.125 2023-11-27 08:27:50,314 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6650, loss[loss=0.0704, simple_loss=0.1021, pruned_loss=0.01151, audio_tagging_loss=0.007843, over 14990.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08795, pruned_loss=0.01177, audio_tagging_loss=0.008665, over 3036089.92 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:28:01,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3811893.3333333335, ans=0.0 2023-11-27 08:28:15,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3811960.0, ans=0.125 2023-11-27 08:28:18,110 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571800 2023-11-27 08:28:36,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3812093.3333333335, ans=0.04949747468305833 2023-11-27 08:28:46,295 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6700, loss[loss=0.07166, simple_loss=0.1025, pruned_loss=0.01361, audio_tagging_loss=0.0068, over 15121.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08834, pruned_loss=0.01183, audio_tagging_loss=0.008582, over 3037406.94 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:29:09,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3812293.3333333335, ans=0.07 2023-11-27 08:29:13,395 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571850 2023-11-27 08:29:15,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.098e+01 9.634e+01 1.039e+02 1.370e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 08:29:30,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3812426.6666666665, ans=0.07 2023-11-27 08:29:35,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3812426.6666666665, ans=0.0 2023-11-27 08:29:42,712 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6750, loss[loss=0.0572, simple_loss=0.07914, pruned_loss=0.008339, audio_tagging_loss=0.00929, over 15206.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08848, pruned_loss=0.01198, audio_tagging_loss=0.008562, over 3036672.76 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:29:42,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3812493.3333333335, ans=0.125 2023-11-27 08:29:49,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-11-27 08:30:01,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3812560.0, ans=0.95 2023-11-27 08:30:09,228 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571900 2023-11-27 08:30:38,287 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6800, loss[loss=0.09262, simple_loss=0.127, pruned_loss=0.02186, audio_tagging_loss=0.007261, over 14512.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08785, pruned_loss=0.01185, audio_tagging_loss=0.008652, over 3034585.96 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:31:05,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571950 2023-11-27 08:31:07,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.218e+01 9.743e+01 1.054e+02 1.281e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 08:31:09,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=22.5 2023-11-27 08:31:33,858 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6850, loss[loss=0.04855, simple_loss=0.06226, pruned_loss=0.009986, audio_tagging_loss=0.007432, over 13968.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08791, pruned_loss=0.01187, audio_tagging_loss=0.00861, over 3033380.41 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:31:52,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3813226.6666666665, ans=0.125 2023-11-27 08:31:52,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=22.5 2023-11-27 08:32:01,227 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572000 2023-11-27 08:32:16,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3813360.0, ans=0.05 2023-11-27 08:32:23,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3813426.6666666665, ans=0.1 2023-11-27 08:32:32,565 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6900, loss[loss=0.06104, simple_loss=0.08649, pruned_loss=0.009616, audio_tagging_loss=0.008179, over 16107.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08893, pruned_loss=0.01194, audio_tagging_loss=0.008527, over 3038546.30 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:32:36,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3813493.3333333335, ans=10.0 2023-11-27 08:32:40,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3813493.3333333335, ans=0.125 2023-11-27 08:32:58,978 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572050 2023-11-27 08:33:01,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.797e+01 9.367e+01 1.009e+02 1.933e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 08:33:02,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3813626.6666666665, ans=0.0 2023-11-27 08:33:11,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3813693.3333333335, ans=0.125 2023-11-27 08:33:14,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3813693.3333333335, ans=0.0 2023-11-27 08:33:15,883 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:33:26,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3813760.0, ans=0.125 2023-11-27 08:33:28,058 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6950, loss[loss=0.07759, simple_loss=0.1041, pruned_loss=0.01695, audio_tagging_loss=0.008562, over 14860.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08898, pruned_loss=0.0118, audio_tagging_loss=0.008473, over 3035576.74 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:33:55,159 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572100 2023-11-27 08:34:06,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=12.0 2023-11-27 08:34:20,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-27 08:34:23,570 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7000, loss[loss=0.08376, simple_loss=0.1159, pruned_loss=0.01882, audio_tagging_loss=0.006984, over 14818.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08922, pruned_loss=0.01192, audio_tagging_loss=0.008534, over 3041688.09 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:34:24,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2023-11-27 08:34:35,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3814226.6666666665, ans=0.0 2023-11-27 08:34:40,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3814226.6666666665, ans=0.125 2023-11-27 08:34:50,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572150 2023-11-27 08:34:52,165 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.213e+01 9.596e+01 1.029e+02 1.427e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-27 08:34:55,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2023-11-27 08:35:19,302 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7050, loss[loss=0.06889, simple_loss=0.08596, pruned_loss=0.01458, audio_tagging_loss=0.01133, over 15182.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08805, pruned_loss=0.01175, audio_tagging_loss=0.008621, over 3041909.51 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:35:19,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3814493.3333333335, ans=0.0 2023-11-27 08:35:24,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3814493.3333333335, ans=0.0 2023-11-27 08:35:27,205 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2023-11-27 08:35:29,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3814560.0, ans=0.0 2023-11-27 08:35:33,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3814560.0, ans=0.0 2023-11-27 08:35:45,969 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572200 2023-11-27 08:35:52,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=22.5 2023-11-27 08:36:14,653 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7100, loss[loss=0.06396, simple_loss=0.0807, pruned_loss=0.01378, audio_tagging_loss=0.009831, over 15529.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08867, pruned_loss=0.01192, audio_tagging_loss=0.008601, over 3051757.36 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:36:32,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3814893.3333333335, ans=0.0 2023-11-27 08:36:40,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3814960.0, ans=0.2 2023-11-27 08:36:42,594 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572250 2023-11-27 08:36:42,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3814960.0, ans=0.125 2023-11-27 08:36:44,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.212e+01 9.022e+01 9.654e+01 1.030e+02 1.274e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 08:36:46,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3814960.0, ans=0.0 2023-11-27 08:37:11,089 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7150, loss[loss=0.06568, simple_loss=0.08809, pruned_loss=0.0142, audio_tagging_loss=0.007433, over 15526.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08853, pruned_loss=0.01202, audio_tagging_loss=0.008686, over 3044640.36 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:37:27,363 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:37:34,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3815293.3333333335, ans=0.0 2023-11-27 08:37:38,028 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572300 2023-11-27 08:37:45,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3815360.0, ans=0.125 2023-11-27 08:37:46,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3815360.0, ans=0.09899494936611666 2023-11-27 08:38:07,750 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7200, loss[loss=0.07062, simple_loss=0.09178, pruned_loss=0.01577, audio_tagging_loss=0.008956, over 16690.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08884, pruned_loss=0.01203, audio_tagging_loss=0.008709, over 3049047.57 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:38:11,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3815493.3333333335, ans=0.0 2023-11-27 08:38:13,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=22.5 2023-11-27 08:38:18,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3815560.0, ans=0.125 2023-11-27 08:38:21,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3815560.0, ans=0.125 2023-11-27 08:38:21,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3815560.0, ans=0.0 2023-11-27 08:38:33,742 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572350 2023-11-27 08:38:35,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 9.027e+01 9.481e+01 1.011e+02 1.295e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 08:38:54,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2023-11-27 08:38:55,683 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2023-11-27 08:39:02,502 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7250, loss[loss=0.05065, simple_loss=0.06422, pruned_loss=0.00801, audio_tagging_loss=0.01054, over 14913.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08776, pruned_loss=0.01187, audio_tagging_loss=0.008899, over 3047963.50 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:39:20,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3815893.3333333335, ans=0.125 2023-11-27 08:39:24,244 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-11-27 08:39:29,493 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572400 2023-11-27 08:39:34,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3815960.0, ans=0.125 2023-11-27 08:39:34,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3815960.0, ans=0.125 2023-11-27 08:39:43,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3816026.6666666665, ans=0.0 2023-11-27 08:39:48,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3816093.3333333335, ans=0.125 2023-11-27 08:39:58,242 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7300, loss[loss=0.06668, simple_loss=0.09325, pruned_loss=0.01173, audio_tagging_loss=0.008322, over 14615.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08876, pruned_loss=0.01189, audio_tagging_loss=0.008779, over 3045812.28 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:39:58,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3816160.0, ans=0.2 2023-11-27 08:40:12,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3816226.6666666665, ans=0.1 2023-11-27 08:40:18,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3816226.6666666665, ans=0.125 2023-11-27 08:40:25,390 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572450 2023-11-27 08:40:27,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 9.262e+01 9.740e+01 1.057e+02 1.335e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 08:40:28,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.58 vs. limit=10.0 2023-11-27 08:40:32,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3816360.0, ans=0.1 2023-11-27 08:40:36,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3816360.0, ans=0.125 2023-11-27 08:40:44,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3816426.6666666665, ans=0.2 2023-11-27 08:40:44,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-27 08:40:50,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3816426.6666666665, ans=0.125 2023-11-27 08:40:51,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3816426.6666666665, ans=0.07 2023-11-27 08:40:54,389 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7350, loss[loss=0.05511, simple_loss=0.07301, pruned_loss=0.01029, audio_tagging_loss=0.00831, over 13555.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08893, pruned_loss=0.0118, audio_tagging_loss=0.008591, over 3045899.85 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:41:02,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3816493.3333333335, ans=0.125 2023-11-27 08:41:09,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3816560.0, ans=0.125 2023-11-27 08:41:17,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3816626.6666666665, ans=0.125 2023-11-27 08:41:20,314 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572500 2023-11-27 08:41:39,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3816760.0, ans=0.0 2023-11-27 08:41:49,713 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7400, loss[loss=0.06084, simple_loss=0.08633, pruned_loss=0.01004, audio_tagging_loss=0.007639, over 14697.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08909, pruned_loss=0.01167, audio_tagging_loss=0.008509, over 3037312.89 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:41:54,543 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-11-27 08:42:14,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3816960.0, ans=0.0 2023-11-27 08:42:16,172 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572550 2023-11-27 08:42:19,727 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.911e+01 9.063e+01 9.701e+01 1.022e+02 1.505e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-27 08:42:37,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3817093.3333333335, ans=0.125 2023-11-27 08:42:44,743 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7450, loss[loss=0.0584, simple_loss=0.08364, pruned_loss=0.009161, audio_tagging_loss=0.007417, over 14464.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.09002, pruned_loss=0.01187, audio_tagging_loss=0.008372, over 3040100.92 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:43:02,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3817226.6666666665, ans=0.1 2023-11-27 08:43:12,338 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572600 2023-11-27 08:43:13,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3817293.3333333335, ans=0.0 2023-11-27 08:43:35,602 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:43:41,241 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7500, loss[loss=0.06761, simple_loss=0.09664, pruned_loss=0.01169, audio_tagging_loss=0.007609, over 15006.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09079, pruned_loss=0.01209, audio_tagging_loss=0.008228, over 3044463.36 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:43:41,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3817493.3333333335, ans=0.1 2023-11-27 08:43:43,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3817493.3333333335, ans=0.125 2023-11-27 08:43:50,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3817493.3333333335, ans=0.125 2023-11-27 08:43:56,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3817560.0, ans=0.125 2023-11-27 08:44:07,904 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572650 2023-11-27 08:44:09,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3817626.6666666665, ans=0.0 2023-11-27 08:44:11,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.822e+01 9.501e+01 1.047e+02 1.367e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 08:44:12,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2023-11-27 08:44:15,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3817693.3333333335, ans=0.125 2023-11-27 08:44:37,578 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7550, loss[loss=0.05529, simple_loss=0.0756, pruned_loss=0.009786, audio_tagging_loss=0.007705, over 15670.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08991, pruned_loss=0.01207, audio_tagging_loss=0.008229, over 3042036.48 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:44:40,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3817826.6666666665, ans=0.04949747468305833 2023-11-27 08:44:52,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3817893.3333333335, ans=0.1 2023-11-27 08:44:58,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3817960.0, ans=0.125 2023-11-27 08:45:01,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=22.5 2023-11-27 08:45:03,364 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572700 2023-11-27 08:45:09,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3818026.6666666665, ans=0.2 2023-11-27 08:45:11,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3818026.6666666665, ans=0.04949747468305833 2023-11-27 08:45:22,766 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2023-11-27 08:45:29,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3818093.3333333335, ans=0.0 2023-11-27 08:45:32,592 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7600, loss[loss=0.07264, simple_loss=0.1009, pruned_loss=0.01452, audio_tagging_loss=0.007686, over 15571.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08896, pruned_loss=0.01194, audio_tagging_loss=0.008336, over 3045338.50 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:45:48,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3818226.6666666665, ans=0.2 2023-11-27 08:45:59,810 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572750 2023-11-27 08:46:02,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.740e+01 9.501e+01 1.030e+02 1.304e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 08:46:10,865 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2023-11-27 08:46:15,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3818360.0, ans=0.0 2023-11-27 08:46:28,094 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7650, loss[loss=0.06202, simple_loss=0.09326, pruned_loss=0.009038, audio_tagging_loss=0.006348, over 15245.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.0888, pruned_loss=0.01197, audio_tagging_loss=0.00836, over 3043741.38 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:46:28,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3818493.3333333335, ans=0.125 2023-11-27 08:46:28,649 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.83 vs. limit=10.0 2023-11-27 08:46:37,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3818493.3333333335, ans=0.0 2023-11-27 08:46:49,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3818560.0, ans=0.125 2023-11-27 08:46:55,240 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572800 2023-11-27 08:47:24,433 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7700, loss[loss=0.05957, simple_loss=0.08611, pruned_loss=0.008584, audio_tagging_loss=0.007935, over 14838.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08907, pruned_loss=0.01197, audio_tagging_loss=0.008357, over 3042309.37 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:47:31,308 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2023-11-27 08:47:34,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3818893.3333333335, ans=0.125 2023-11-27 08:47:46,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3818960.0, ans=0.125 2023-11-27 08:47:50,528 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572850 2023-11-27 08:47:53,659 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 9.058e+01 9.794e+01 1.057e+02 1.473e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-27 08:47:55,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3818960.0, ans=0.0 2023-11-27 08:48:19,660 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7750, loss[loss=0.07696, simple_loss=0.1094, pruned_loss=0.01345, audio_tagging_loss=0.008796, over 15433.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08863, pruned_loss=0.01187, audio_tagging_loss=0.008417, over 3037672.80 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:48:39,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3819226.6666666665, ans=0.125 2023-11-27 08:48:47,340 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572900 2023-11-27 08:48:48,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3819293.3333333335, ans=0.1 2023-11-27 08:48:55,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3819360.0, ans=0.125 2023-11-27 08:49:11,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3819426.6666666665, ans=0.0 2023-11-27 08:49:15,383 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7800, loss[loss=0.07074, simple_loss=0.09432, pruned_loss=0.01483, audio_tagging_loss=0.008759, over 15386.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0889, pruned_loss=0.01189, audio_tagging_loss=0.008466, over 3044594.52 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:49:24,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3819493.3333333335, ans=0.2 2023-11-27 08:49:42,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572950 2023-11-27 08:49:45,620 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 9.181e+01 9.727e+01 1.046e+02 1.272e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 08:49:54,342 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2023-11-27 08:50:11,735 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7850, loss[loss=0.0553, simple_loss=0.06079, pruned_loss=0.01283, audio_tagging_loss=0.01207, over 14493.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08864, pruned_loss=0.01186, audio_tagging_loss=0.008603, over 3049139.78 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:50:22,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3819893.3333333335, ans=0.125 2023-11-27 08:50:35,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.19 vs. limit=10.0 2023-11-27 08:50:36,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-27 08:50:38,046 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573000 2023-11-27 08:50:52,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3820026.6666666665, ans=0.125 2023-11-27 08:50:54,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.53 vs. limit=6.0 2023-11-27 08:51:07,215 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7900, loss[loss=0.0631, simple_loss=0.08851, pruned_loss=0.01189, audio_tagging_loss=0.00695, over 14840.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08906, pruned_loss=0.01193, audio_tagging_loss=0.008649, over 3049369.52 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:51:17,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3820226.6666666665, ans=0.125 2023-11-27 08:51:17,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-27 08:51:28,827 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-11-27 08:51:34,165 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573050 2023-11-27 08:51:37,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.827e+01 9.140e+01 9.856e+01 1.052e+02 1.450e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 08:51:41,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2023-11-27 08:51:51,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3820426.6666666665, ans=0.125 2023-11-27 08:51:58,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3820426.6666666665, ans=0.125 2023-11-27 08:51:59,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3820426.6666666665, ans=0.125 2023-11-27 08:52:02,773 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7950, loss[loss=0.06379, simple_loss=0.08682, pruned_loss=0.01136, audio_tagging_loss=0.009021, over 14535.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08949, pruned_loss=0.01214, audio_tagging_loss=0.008702, over 3058761.50 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:52:10,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3820493.3333333335, ans=0.1 2023-11-27 08:52:17,618 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:52:25,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3820626.6666666665, ans=0.5 2023-11-27 08:52:29,795 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573100 2023-11-27 08:52:39,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3820693.3333333335, ans=22.5 2023-11-27 08:52:45,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3820693.3333333335, ans=0.5 2023-11-27 08:52:48,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3820760.0, ans=0.125 2023-11-27 08:52:52,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3820760.0, ans=0.125 2023-11-27 08:52:59,097 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8000, loss[loss=0.06426, simple_loss=0.09371, pruned_loss=0.009836, audio_tagging_loss=0.007565, over 16788.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08924, pruned_loss=0.0121, audio_tagging_loss=0.008857, over 3058662.38 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:52:59,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3820826.6666666665, ans=0.025 2023-11-27 08:53:02,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=22.5 2023-11-27 08:53:09,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3820893.3333333335, ans=0.125 2023-11-27 08:53:17,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3820893.3333333335, ans=0.125 2023-11-27 08:53:25,497 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573150 2023-11-27 08:53:28,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.923e+01 9.617e+01 1.018e+02 1.242e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 08:53:45,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3821093.3333333335, ans=0.125 2023-11-27 08:53:54,533 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8050, loss[loss=0.06733, simple_loss=0.09044, pruned_loss=0.01265, audio_tagging_loss=0.009466, over 14122.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08949, pruned_loss=0.0121, audio_tagging_loss=0.008889, over 3048789.41 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:54:00,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.85 vs. limit=10.0 2023-11-27 08:54:09,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3821226.6666666665, ans=0.02 2023-11-27 08:54:09,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-27 08:54:15,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3821293.3333333335, ans=0.125 2023-11-27 08:54:21,105 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573200 2023-11-27 08:54:34,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=12.0 2023-11-27 08:54:47,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.65 vs. limit=6.0 2023-11-27 08:54:48,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3821426.6666666665, ans=0.2 2023-11-27 08:54:49,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3821493.3333333335, ans=0.2 2023-11-27 08:54:50,349 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8100, loss[loss=0.06352, simple_loss=0.07974, pruned_loss=0.01168, audio_tagging_loss=0.01197, over 16788.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09009, pruned_loss=0.01225, audio_tagging_loss=0.008808, over 3052752.19 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:54:58,094 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:55:05,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3821560.0, ans=0.0 2023-11-27 08:55:13,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2023-11-27 08:55:15,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3821626.6666666665, ans=0.0 2023-11-27 08:55:17,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573250 2023-11-27 08:55:21,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 8.982e+01 9.731e+01 1.040e+02 1.240e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 08:55:39,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3821760.0, ans=0.0 2023-11-27 08:55:46,153 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8150, loss[loss=0.06587, simple_loss=0.08706, pruned_loss=0.01381, audio_tagging_loss=0.008528, over 16405.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08952, pruned_loss=0.01219, audio_tagging_loss=0.008692, over 3045791.39 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:55:55,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3821826.6666666665, ans=0.0 2023-11-27 08:56:01,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3821893.3333333335, ans=0.125 2023-11-27 08:56:08,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3821960.0, ans=0.0 2023-11-27 08:56:13,217 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573300 2023-11-27 08:56:41,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3822160.0, ans=0.125 2023-11-27 08:56:41,844 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8200, loss[loss=0.05341, simple_loss=0.07077, pruned_loss=0.01017, audio_tagging_loss=0.007854, over 14907.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08977, pruned_loss=0.0121, audio_tagging_loss=0.008536, over 3047836.84 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:56:42,911 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:56:46,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3822160.0, ans=0.0 2023-11-27 08:57:08,866 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573350 2023-11-27 08:57:09,532 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-11-27 08:57:13,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 9.014e+01 9.648e+01 1.048e+02 1.501e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 08:57:15,960 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:57:26,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2023-11-27 08:57:33,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3822426.6666666665, ans=0.125 2023-11-27 08:57:38,036 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8250, loss[loss=0.05578, simple_loss=0.07748, pruned_loss=0.00782, audio_tagging_loss=0.009213, over 15408.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0903, pruned_loss=0.01215, audio_tagging_loss=0.008548, over 3048258.50 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:57:57,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3822560.0, ans=0.125 2023-11-27 08:58:04,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573400 2023-11-27 08:58:15,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3822693.3333333335, ans=0.125 2023-11-27 08:58:19,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3822693.3333333335, ans=0.125 2023-11-27 08:58:28,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.55 vs. limit=22.5 2023-11-27 08:58:34,673 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8300, loss[loss=0.06581, simple_loss=0.09044, pruned_loss=0.01326, audio_tagging_loss=0.007328, over 15072.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.0894, pruned_loss=0.01198, audio_tagging_loss=0.008552, over 3053019.57 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:58:36,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3822826.6666666665, ans=10.0 2023-11-27 08:58:53,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3822893.3333333335, ans=0.125 2023-11-27 08:58:58,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3822960.0, ans=0.125 2023-11-27 08:59:01,349 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573450 2023-11-27 08:59:05,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.971e+01 9.543e+01 1.035e+02 1.385e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 08:59:06,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3823026.6666666665, ans=0.125 2023-11-27 08:59:17,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2023-11-27 08:59:20,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3823093.3333333335, ans=0.2 2023-11-27 08:59:30,201 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8350, loss[loss=0.06023, simple_loss=0.08242, pruned_loss=0.0113, audio_tagging_loss=0.007716, over 15954.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08923, pruned_loss=0.01193, audio_tagging_loss=0.008504, over 3057045.40 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:59:50,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3823226.6666666665, ans=0.125 2023-11-27 08:59:57,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573500 2023-11-27 09:00:18,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3823426.6666666665, ans=0.0 2023-11-27 09:00:23,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=12.0 2023-11-27 09:00:25,939 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8400, loss[loss=0.05926, simple_loss=0.08218, pruned_loss=0.009696, audio_tagging_loss=0.00848, over 14828.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08887, pruned_loss=0.0118, audio_tagging_loss=0.008566, over 3056421.23 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:00:27,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3823493.3333333335, ans=0.07 2023-11-27 09:00:36,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3823560.0, ans=0.2 2023-11-27 09:00:48,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3823626.6666666665, ans=0.2 2023-11-27 09:00:52,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573550 2023-11-27 09:00:56,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.941e+01 9.645e+01 1.032e+02 1.251e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 09:01:03,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3823693.3333333335, ans=0.2 2023-11-27 09:01:05,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3823693.3333333335, ans=0.0 2023-11-27 09:01:21,022 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8450, loss[loss=0.05778, simple_loss=0.07414, pruned_loss=0.01159, audio_tagging_loss=0.009118, over 14952.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08838, pruned_loss=0.01185, audio_tagging_loss=0.008605, over 3050552.81 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:01:21,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3823826.6666666665, ans=0.125 2023-11-27 09:01:36,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3823893.3333333335, ans=0.125 2023-11-27 09:01:47,453 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573600 2023-11-27 09:01:48,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-27 09:01:51,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3823960.0, ans=0.0 2023-11-27 09:01:51,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3823960.0, ans=0.125 2023-11-27 09:02:16,993 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8500, loss[loss=0.09047, simple_loss=0.1293, pruned_loss=0.01878, audio_tagging_loss=0.007063, over 15691.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08969, pruned_loss=0.01203, audio_tagging_loss=0.008584, over 3048803.92 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:02:21,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2023-11-27 09:02:27,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=22.5 2023-11-27 09:02:43,616 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573650 2023-11-27 09:02:46,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=22.5 2023-11-27 09:02:48,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 9.075e+01 9.563e+01 1.041e+02 1.324e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 09:02:53,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3824360.0, ans=0.0 2023-11-27 09:03:02,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3824426.6666666665, ans=0.0 2023-11-27 09:03:11,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2023-11-27 09:03:12,110 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8550, loss[loss=0.0698, simple_loss=0.09422, pruned_loss=0.01389, audio_tagging_loss=0.008798, over 16971.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08976, pruned_loss=0.01209, audio_tagging_loss=0.008547, over 3055883.04 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:03:16,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3824493.3333333335, ans=0.0 2023-11-27 09:03:16,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3824493.3333333335, ans=0.0 2023-11-27 09:03:18,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=12.0 2023-11-27 09:03:25,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.51 vs. limit=10.0 2023-11-27 09:03:29,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3824560.0, ans=22.5 2023-11-27 09:03:35,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3824626.6666666665, ans=0.0 2023-11-27 09:03:39,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573700 2023-11-27 09:04:01,993 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2023-11-27 09:04:06,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3824760.0, ans=0.125 2023-11-27 09:04:08,392 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8600, loss[loss=0.08252, simple_loss=0.1055, pruned_loss=0.02014, audio_tagging_loss=0.009631, over 14743.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0898, pruned_loss=0.01222, audio_tagging_loss=0.008634, over 3050235.22 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:04:15,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-11-27 09:04:22,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=12.0 2023-11-27 09:04:34,970 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573750 2023-11-27 09:04:39,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.143e+01 9.907e+01 1.055e+02 1.409e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-27 09:04:43,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3825026.6666666665, ans=0.1 2023-11-27 09:04:45,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-27 09:04:58,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=22.5 2023-11-27 09:05:04,578 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8650, loss[loss=0.06782, simple_loss=0.09661, pruned_loss=0.01291, audio_tagging_loss=0.006615, over 15004.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09034, pruned_loss=0.0122, audio_tagging_loss=0.008651, over 3050581.24 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:05:11,344 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2023-11-27 09:05:13,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3825160.0, ans=0.0 2023-11-27 09:05:15,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3825226.6666666665, ans=0.0 2023-11-27 09:05:23,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3825226.6666666665, ans=0.125 2023-11-27 09:05:26,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3825293.3333333335, ans=0.0 2023-11-27 09:05:27,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3825293.3333333335, ans=0.1 2023-11-27 09:05:30,634 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573800 2023-11-27 09:05:30,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3825293.3333333335, ans=0.125 2023-11-27 09:05:58,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3825426.6666666665, ans=0.125 2023-11-27 09:06:00,072 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8700, loss[loss=0.08133, simple_loss=0.1096, pruned_loss=0.02058, audio_tagging_loss=0.005933, over 15943.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08969, pruned_loss=0.01204, audio_tagging_loss=0.008701, over 3043566.74 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:06:27,597 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573850 2023-11-27 09:06:29,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3825626.6666666665, ans=0.1 2023-11-27 09:06:32,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.211e+01 9.788e+01 1.039e+02 1.317e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-27 09:06:39,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3825693.3333333335, ans=0.125 2023-11-27 09:06:42,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3825693.3333333335, ans=0.125 2023-11-27 09:06:48,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3825760.0, ans=0.0 2023-11-27 09:06:54,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3825826.6666666665, ans=0.0 2023-11-27 09:06:55,777 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8750, loss[loss=0.05747, simple_loss=0.07913, pruned_loss=0.005378, audio_tagging_loss=0.01253, over 15833.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08989, pruned_loss=0.01201, audio_tagging_loss=0.008783, over 3044326.22 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:06:56,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3825826.6666666665, ans=0.125 2023-11-27 09:07:06,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3825893.3333333335, ans=0.0 2023-11-27 09:07:15,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3825893.3333333335, ans=0.2 2023-11-27 09:07:22,887 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573900 2023-11-27 09:07:23,685 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-27 09:07:44,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3826093.3333333335, ans=0.125 2023-11-27 09:07:47,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2023-11-27 09:07:52,440 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8800, loss[loss=0.05714, simple_loss=0.07716, pruned_loss=0.008394, audio_tagging_loss=0.01016, over 14737.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09019, pruned_loss=0.0121, audio_tagging_loss=0.008852, over 3046530.49 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:07:54,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2023-11-27 09:07:59,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3826160.0, ans=0.125 2023-11-27 09:08:17,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.03 vs. limit=10.0 2023-11-27 09:08:18,405 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573950 2023-11-27 09:08:23,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.347e+01 1.002e+02 1.077e+02 1.340e+02, threshold=2.003e+02, percent-clipped=0.0 2023-11-27 09:08:32,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3826360.0, ans=0.125 2023-11-27 09:08:33,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3826360.0, ans=0.0 2023-11-27 09:08:37,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3826426.6666666665, ans=0.0 2023-11-27 09:08:47,536 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8850, loss[loss=0.07595, simple_loss=0.1124, pruned_loss=0.01192, audio_tagging_loss=0.007843, over 15615.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.0915, pruned_loss=0.01213, audio_tagging_loss=0.008739, over 3051165.94 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:08:58,701 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:09:13,946 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574000 2023-11-27 09:09:42,749 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8900, loss[loss=0.06217, simple_loss=0.08844, pruned_loss=0.01068, audio_tagging_loss=0.007269, over 14207.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09226, pruned_loss=0.01225, audio_tagging_loss=0.008542, over 3053996.52 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:09:42,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3826826.6666666665, ans=0.125 2023-11-27 09:10:00,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3826893.3333333335, ans=0.04949747468305833 2023-11-27 09:10:05,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3826960.0, ans=0.125 2023-11-27 09:10:09,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574050 2023-11-27 09:10:10,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.84 vs. limit=10.0 2023-11-27 09:10:16,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.024e+01 9.616e+01 1.025e+02 1.217e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 09:10:17,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3827026.6666666665, ans=0.2 2023-11-27 09:10:22,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=12.0 2023-11-27 09:10:24,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3827026.6666666665, ans=0.125 2023-11-27 09:10:38,544 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8950, loss[loss=0.0534, simple_loss=0.06928, pruned_loss=0.008997, audio_tagging_loss=0.009762, over 15476.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.0914, pruned_loss=0.01209, audio_tagging_loss=0.008493, over 3050941.82 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:10:41,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3827160.0, ans=0.0 2023-11-27 09:10:58,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3827226.6666666665, ans=0.0 2023-11-27 09:11:05,626 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574100 2023-11-27 09:11:13,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3827360.0, ans=0.1 2023-11-27 09:11:22,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3827426.6666666665, ans=0.09899494936611666 2023-11-27 09:11:25,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-27 09:11:32,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3827426.6666666665, ans=0.125 2023-11-27 09:11:34,686 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9000, loss[loss=0.05163, simple_loss=0.06966, pruned_loss=0.007214, audio_tagging_loss=0.009586, over 15888.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09141, pruned_loss=0.01215, audio_tagging_loss=0.008437, over 3057829.02 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:11:34,687 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 09:11:47,135 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5046, 4.2591, 3.6274, 4.0772], device='cuda:2') 2023-11-27 09:12:07,537 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05893, simple_loss=0.05035, pruned_loss=0.005253, audio_tagging_loss=0.0285, over 4681554.00 frames. 2023-11-27 09:12:07,538 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 09:12:14,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3827493.3333333335, ans=0.125 2023-11-27 09:12:34,554 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574150 2023-11-27 09:12:40,848 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.087e+01 9.718e+01 1.070e+02 1.602e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 09:12:41,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3827693.3333333335, ans=0.0 2023-11-27 09:12:49,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3827693.3333333335, ans=0.0 2023-11-27 09:13:03,708 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9050, loss[loss=0.07332, simple_loss=0.1031, pruned_loss=0.01459, audio_tagging_loss=0.007206, over 14997.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0902, pruned_loss=0.01193, audio_tagging_loss=0.008438, over 3055526.72 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:13:12,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3827826.6666666665, ans=0.0 2023-11-27 09:13:30,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574200 2023-11-27 09:13:31,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.44 vs. limit=15.0 2023-11-27 09:13:35,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3828026.6666666665, ans=0.125 2023-11-27 09:13:42,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3828026.6666666665, ans=0.2 2023-11-27 09:13:45,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3828026.6666666665, ans=0.015 2023-11-27 09:13:49,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3828093.3333333335, ans=0.125 2023-11-27 09:13:59,519 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9100, loss[loss=0.07123, simple_loss=0.1062, pruned_loss=0.01287, audio_tagging_loss=0.005242, over 15248.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08894, pruned_loss=0.01175, audio_tagging_loss=0.008501, over 3051729.69 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:14:08,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3828160.0, ans=0.0 2023-11-27 09:14:16,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3828226.6666666665, ans=0.125 2023-11-27 09:14:26,643 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574250 2023-11-27 09:14:32,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.030e+01 9.534e+01 1.010e+02 1.225e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 09:14:38,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3828360.0, ans=0.0 2023-11-27 09:14:46,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3828426.6666666665, ans=0.125 2023-11-27 09:14:55,589 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9150, loss[loss=0.06449, simple_loss=0.08153, pruned_loss=0.01294, audio_tagging_loss=0.01078, over 14898.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08813, pruned_loss=0.01167, audio_tagging_loss=0.008534, over 3048178.00 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:15:01,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3828493.3333333335, ans=0.125 2023-11-27 09:15:02,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3828493.3333333335, ans=0.125 2023-11-27 09:15:10,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3828560.0, ans=0.1 2023-11-27 09:15:11,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3828560.0, ans=0.2 2023-11-27 09:15:12,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.11 vs. limit=22.5 2023-11-27 09:15:22,655 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574300 2023-11-27 09:15:45,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3828760.0, ans=0.1 2023-11-27 09:15:45,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3828760.0, ans=0.07 2023-11-27 09:15:51,840 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9200, loss[loss=0.07067, simple_loss=0.1047, pruned_loss=0.01187, audio_tagging_loss=0.006474, over 15228.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08873, pruned_loss=0.0117, audio_tagging_loss=0.008464, over 3043825.94 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:16:18,613 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574350 2023-11-27 09:16:24,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 9.042e+01 9.589e+01 1.020e+02 1.357e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 09:16:27,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3829026.6666666665, ans=0.125 2023-11-27 09:16:28,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3829026.6666666665, ans=0.0 2023-11-27 09:16:33,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3829026.6666666665, ans=0.125 2023-11-27 09:16:35,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3829093.3333333335, ans=0.04949747468305833 2023-11-27 09:16:36,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-11-27 09:16:46,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3829160.0, ans=0.125 2023-11-27 09:16:47,644 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9250, loss[loss=0.07993, simple_loss=0.1185, pruned_loss=0.01368, audio_tagging_loss=0.006979, over 16309.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08932, pruned_loss=0.01191, audio_tagging_loss=0.008372, over 3048496.45 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:16:49,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2023-11-27 09:17:07,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3829226.6666666665, ans=0.1 2023-11-27 09:17:08,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3829293.3333333335, ans=0.07 2023-11-27 09:17:14,290 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574400 2023-11-27 09:17:20,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3829360.0, ans=0.125 2023-11-27 09:17:23,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.98 vs. limit=22.5 2023-11-27 09:17:24,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3829360.0, ans=0.125 2023-11-27 09:17:29,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3829360.0, ans=0.125 2023-11-27 09:17:42,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3829493.3333333335, ans=0.0 2023-11-27 09:17:43,332 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9300, loss[loss=0.06109, simple_loss=0.07991, pruned_loss=0.01297, audio_tagging_loss=0.008171, over 14143.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08914, pruned_loss=0.01195, audio_tagging_loss=0.008433, over 3043257.61 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:17:47,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3829493.3333333335, ans=0.125 2023-11-27 09:18:09,919 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574450 2023-11-27 09:18:16,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.342e+01 9.838e+01 1.063e+02 1.386e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 09:18:22,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3829693.3333333335, ans=10.0 2023-11-27 09:18:25,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3829693.3333333335, ans=0.125 2023-11-27 09:18:28,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.54 vs. limit=10.0 2023-11-27 09:18:38,925 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9350, loss[loss=0.06663, simple_loss=0.09527, pruned_loss=0.01197, audio_tagging_loss=0.007024, over 14015.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08943, pruned_loss=0.01195, audio_tagging_loss=0.008386, over 3034048.24 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:18:45,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3829826.6666666665, ans=0.125 2023-11-27 09:18:52,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2023-11-27 09:18:55,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829893.3333333335, ans=0.1 2023-11-27 09:18:55,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3829893.3333333335, ans=0.5 2023-11-27 09:18:58,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3829893.3333333335, ans=0.125 2023-11-27 09:19:05,629 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574500 2023-11-27 09:19:08,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3829960.0, ans=0.0 2023-11-27 09:19:11,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=22.5 2023-11-27 09:19:23,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3830093.3333333335, ans=0.0 2023-11-27 09:19:24,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3830093.3333333335, ans=6.0 2023-11-27 09:19:30,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3830093.3333333335, ans=0.125 2023-11-27 09:19:34,668 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9400, loss[loss=0.06499, simple_loss=0.09556, pruned_loss=0.009042, audio_tagging_loss=0.008162, over 14879.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08919, pruned_loss=0.01198, audio_tagging_loss=0.008568, over 3034244.10 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:19:38,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3830160.0, ans=0.07 2023-11-27 09:19:38,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3830160.0, ans=0.125 2023-11-27 09:19:49,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3830226.6666666665, ans=0.0 2023-11-27 09:20:01,306 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574550 2023-11-27 09:20:07,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3830360.0, ans=0.125 2023-11-27 09:20:08,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3830360.0, ans=0.125 2023-11-27 09:20:09,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.781e+01 8.905e+01 9.680e+01 1.031e+02 1.220e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 09:20:11,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3830360.0, ans=0.0 2023-11-27 09:20:29,215 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:20:30,793 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9450, loss[loss=0.06144, simple_loss=0.08206, pruned_loss=0.01019, audio_tagging_loss=0.01022, over 14808.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08945, pruned_loss=0.01201, audio_tagging_loss=0.008602, over 3040610.40 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:20:35,499 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=12.0 2023-11-27 09:20:39,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3830493.3333333335, ans=0.125 2023-11-27 09:20:57,576 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574600 2023-11-27 09:20:57,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3830626.6666666665, ans=0.1 2023-11-27 09:21:06,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3830693.3333333335, ans=0.2 2023-11-27 09:21:12,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3830693.3333333335, ans=0.2 2023-11-27 09:21:14,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3830760.0, ans=0.125 2023-11-27 09:21:16,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3830760.0, ans=0.0 2023-11-27 09:21:26,370 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9500, loss[loss=0.0679, simple_loss=0.09952, pruned_loss=0.01096, audio_tagging_loss=0.007176, over 16041.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.0886, pruned_loss=0.01192, audio_tagging_loss=0.008744, over 3037689.08 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:21:28,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3830826.6666666665, ans=0.125 2023-11-27 09:21:48,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3830960.0, ans=0.125 2023-11-27 09:21:52,658 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574650 2023-11-27 09:22:01,092 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.168e+01 9.748e+01 1.058e+02 1.599e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 09:22:12,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3831093.3333333335, ans=0.07 2023-11-27 09:22:22,110 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9550, loss[loss=0.05886, simple_loss=0.07547, pruned_loss=0.01195, audio_tagging_loss=0.009174, over 13808.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08853, pruned_loss=0.0119, audio_tagging_loss=0.008807, over 3032662.00 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:22:23,613 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2023-11-27 09:22:26,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3831160.0, ans=0.125 2023-11-27 09:22:32,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3831226.6666666665, ans=0.05 2023-11-27 09:22:45,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3831293.3333333335, ans=0.0 2023-11-27 09:22:47,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3831293.3333333335, ans=0.2 2023-11-27 09:22:48,533 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574700 2023-11-27 09:22:51,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3831293.3333333335, ans=0.0 2023-11-27 09:22:52,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3831293.3333333335, ans=0.125 2023-11-27 09:22:58,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3831360.0, ans=0.2 2023-11-27 09:23:03,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2023-11-27 09:23:16,909 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9600, loss[loss=0.06317, simple_loss=0.08451, pruned_loss=0.009724, audio_tagging_loss=0.01119, over 15712.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08861, pruned_loss=0.01186, audio_tagging_loss=0.008881, over 3040858.17 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:23:22,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3831493.3333333335, ans=0.02 2023-11-27 09:23:26,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3831493.3333333335, ans=0.0 2023-11-27 09:23:34,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3831560.0, ans=0.125 2023-11-27 09:23:35,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3831560.0, ans=0.2 2023-11-27 09:23:42,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3831626.6666666665, ans=0.125 2023-11-27 09:23:44,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574750 2023-11-27 09:23:51,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 9.086e+01 9.692e+01 1.047e+02 1.227e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-27 09:23:52,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3831693.3333333335, ans=0.0 2023-11-27 09:24:00,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-27 09:24:12,912 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9650, loss[loss=0.06687, simple_loss=0.09609, pruned_loss=0.01259, audio_tagging_loss=0.006227, over 15008.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08845, pruned_loss=0.01177, audio_tagging_loss=0.008856, over 3042631.14 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:24:17,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831826.6666666665, ans=0.1 2023-11-27 09:24:21,595 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:24:32,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3831893.3333333335, ans=0.04949747468305833 2023-11-27 09:24:33,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-27 09:24:39,543 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574800 2023-11-27 09:24:54,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3832026.6666666665, ans=0.125 2023-11-27 09:24:56,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.66 vs. limit=10.0 2023-11-27 09:25:01,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3832093.3333333335, ans=0.125 2023-11-27 09:25:02,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3832093.3333333335, ans=0.125 2023-11-27 09:25:08,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-27 09:25:09,690 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9700, loss[loss=0.0786, simple_loss=0.1124, pruned_loss=0.01613, audio_tagging_loss=0.006273, over 14906.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0899, pruned_loss=0.01213, audio_tagging_loss=0.008698, over 3049348.69 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:25:12,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3832160.0, ans=0.0 2023-11-27 09:25:24,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-27 09:25:36,483 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574850 2023-11-27 09:25:44,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 9.066e+01 9.770e+01 1.059e+02 1.296e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-27 09:26:01,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3832426.6666666665, ans=0.125 2023-11-27 09:26:05,277 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9750, loss[loss=0.07195, simple_loss=0.1081, pruned_loss=0.01153, audio_tagging_loss=0.006352, over 16023.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09033, pruned_loss=0.01218, audio_tagging_loss=0.008591, over 3051329.38 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:26:16,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3832560.0, ans=0.125 2023-11-27 09:26:30,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3832626.6666666665, ans=0.0 2023-11-27 09:26:32,884 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574900 2023-11-27 09:26:43,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3832693.3333333335, ans=0.0 2023-11-27 09:26:57,275 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-27 09:27:01,002 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9800, loss[loss=0.06032, simple_loss=0.08415, pruned_loss=0.01032, audio_tagging_loss=0.007921, over 15006.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0898, pruned_loss=0.01222, audio_tagging_loss=0.008558, over 3056287.71 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:27:01,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3832826.6666666665, ans=0.125 2023-11-27 09:27:07,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-11-27 09:27:17,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3832893.3333333335, ans=0.1 2023-11-27 09:27:26,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-11-27 09:27:28,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574950 2023-11-27 09:27:28,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3832960.0, ans=0.125 2023-11-27 09:27:28,692 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2023-11-27 09:27:29,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3832960.0, ans=0.125 2023-11-27 09:27:31,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3832960.0, ans=0.1 2023-11-27 09:27:36,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 9.044e+01 9.762e+01 1.048e+02 1.288e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 09:27:40,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3833026.6666666665, ans=0.125 2023-11-27 09:27:46,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3833093.3333333335, ans=0.1 2023-11-27 09:27:51,903 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:27:57,773 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9850, loss[loss=0.06489, simple_loss=0.0898, pruned_loss=0.01184, audio_tagging_loss=0.008155, over 13844.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09081, pruned_loss=0.01224, audio_tagging_loss=0.008413, over 3049114.68 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:28:09,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3833226.6666666665, ans=0.0 2023-11-27 09:28:11,729 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:28:13,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3833226.6666666665, ans=0.1 2023-11-27 09:28:16,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3833226.6666666665, ans=0.1 2023-11-27 09:28:22,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2023-11-27 09:28:23,730 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575000 2023-11-27 09:28:49,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-27 09:28:53,335 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9900, loss[loss=0.08124, simple_loss=0.1128, pruned_loss=0.01817, audio_tagging_loss=0.006648, over 15661.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09021, pruned_loss=0.01208, audio_tagging_loss=0.008515, over 3045178.40 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:29:08,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-27 09:29:11,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3833560.0, ans=0.125 2023-11-27 09:29:15,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3833626.6666666665, ans=0.125 2023-11-27 09:29:21,178 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575050 2023-11-27 09:29:24,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3833626.6666666665, ans=0.1 2023-11-27 09:29:28,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.072e+01 9.748e+01 1.058e+02 2.513e+02, threshold=1.950e+02, percent-clipped=1.0 2023-11-27 09:29:29,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3833693.3333333335, ans=0.125 2023-11-27 09:29:34,127 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:29:39,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3833760.0, ans=0.125 2023-11-27 09:29:39,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3833760.0, ans=0.125 2023-11-27 09:29:43,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3833760.0, ans=0.2 2023-11-27 09:29:44,644 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2023-11-27 09:29:49,340 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9950, loss[loss=0.0538, simple_loss=0.07548, pruned_loss=0.007484, audio_tagging_loss=0.008581, over 14758.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08933, pruned_loss=0.0119, audio_tagging_loss=0.008512, over 3050733.73 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:30:05,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3833893.3333333335, ans=0.125 2023-11-27 09:30:16,020 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575100 2023-11-27 09:30:19,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2023-11-27 09:30:21,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3834026.6666666665, ans=0.125 2023-11-27 09:30:23,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-27 09:30:36,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3834093.3333333335, ans=0.04949747468305833 2023-11-27 09:30:44,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=22.5 2023-11-27 09:30:45,627 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10000, loss[loss=0.06165, simple_loss=0.0799, pruned_loss=0.0136, audio_tagging_loss=0.008099, over 15296.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08995, pruned_loss=0.0121, audio_tagging_loss=0.008445, over 3054687.89 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:30:46,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3834160.0, ans=0.05 2023-11-27 09:31:11,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575150 2023-11-27 09:31:12,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2023-11-27 09:31:13,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2023-11-27 09:31:21,704 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.035e+01 9.520e+01 1.022e+02 1.313e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 09:31:27,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3834360.0, ans=0.1 2023-11-27 09:31:36,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3834426.6666666665, ans=0.125 2023-11-27 09:31:40,801 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10050, loss[loss=0.08094, simple_loss=0.1111, pruned_loss=0.01673, audio_tagging_loss=0.008643, over 13803.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.09001, pruned_loss=0.01205, audio_tagging_loss=0.008471, over 3047395.52 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:32:04,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3834626.6666666665, ans=0.125 2023-11-27 09:32:07,304 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575200 2023-11-27 09:32:09,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3834626.6666666665, ans=0.0 2023-11-27 09:32:25,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3834760.0, ans=0.0 2023-11-27 09:32:33,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3834760.0, ans=0.0 2023-11-27 09:32:34,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2023-11-27 09:32:36,111 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-27 09:32:36,492 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10100, loss[loss=0.05504, simple_loss=0.07559, pruned_loss=0.008038, audio_tagging_loss=0.009211, over 15177.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08982, pruned_loss=0.01202, audio_tagging_loss=0.008495, over 3049154.39 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:33:03,225 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575250 2023-11-27 09:33:05,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3834960.0, ans=0.125 2023-11-27 09:33:12,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.819e+01 8.969e+01 9.511e+01 1.051e+02 1.642e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 09:33:16,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3835026.6666666665, ans=0.125 2023-11-27 09:33:19,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-27 09:33:21,665 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:33:31,807 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10150, loss[loss=0.05977, simple_loss=0.07831, pruned_loss=0.009953, audio_tagging_loss=0.01066, over 15126.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08969, pruned_loss=0.01212, audio_tagging_loss=0.008558, over 3046851.27 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:33:39,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3835160.0, ans=0.125 2023-11-27 09:33:41,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3835160.0, ans=0.05 2023-11-27 09:33:50,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3835226.6666666665, ans=0.125 2023-11-27 09:33:51,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3835226.6666666665, ans=0.125 2023-11-27 09:33:59,258 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:33:59,298 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575300 2023-11-27 09:34:28,273 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10200, loss[loss=0.05501, simple_loss=0.07895, pruned_loss=0.00742, audio_tagging_loss=0.008118, over 13963.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.0893, pruned_loss=0.01202, audio_tagging_loss=0.008583, over 3040577.78 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:34:30,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3835493.3333333335, ans=0.125 2023-11-27 09:34:30,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2023-11-27 09:34:44,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3835560.0, ans=0.125 2023-11-27 09:34:48,929 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:34:54,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3835626.6666666665, ans=0.0 2023-11-27 09:34:54,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575350 2023-11-27 09:34:56,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=3835626.6666666665, ans=12.0 2023-11-27 09:34:57,534 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-11-27 09:35:03,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3835693.3333333335, ans=0.0 2023-11-27 09:35:06,657 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 9.149e+01 9.728e+01 1.044e+02 1.552e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 09:35:18,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3835760.0, ans=0.125 2023-11-27 09:35:24,028 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10250, loss[loss=0.06568, simple_loss=0.08915, pruned_loss=0.01153, audio_tagging_loss=0.009585, over 14172.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08889, pruned_loss=0.0119, audio_tagging_loss=0.008639, over 3046967.35 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:35:32,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3835826.6666666665, ans=0.125 2023-11-27 09:35:49,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3835960.0, ans=0.1 2023-11-27 09:35:49,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.57 vs. limit=10.0 2023-11-27 09:35:51,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575400 2023-11-27 09:36:12,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3836093.3333333335, ans=0.1 2023-11-27 09:36:19,989 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10300, loss[loss=0.06223, simple_loss=0.08573, pruned_loss=0.01006, audio_tagging_loss=0.0093, over 15543.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08874, pruned_loss=0.01184, audio_tagging_loss=0.008779, over 3051714.77 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:36:37,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3836226.6666666665, ans=0.0 2023-11-27 09:36:40,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3836226.6666666665, ans=0.1 2023-11-27 09:36:41,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3836293.3333333335, ans=0.2 2023-11-27 09:36:46,341 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575450 2023-11-27 09:36:57,292 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.883e+01 9.810e+01 1.061e+02 1.854e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 09:37:04,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3836426.6666666665, ans=0.0 2023-11-27 09:37:07,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-27 09:37:11,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-27 09:37:16,048 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10350, loss[loss=0.08705, simple_loss=0.1153, pruned_loss=0.02067, audio_tagging_loss=0.008748, over 14887.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08849, pruned_loss=0.01183, audio_tagging_loss=0.008881, over 3048398.40 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:37:24,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=22.5 2023-11-27 09:37:42,168 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575500 2023-11-27 09:37:52,140 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2023-11-27 09:37:53,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3836693.3333333335, ans=0.0 2023-11-27 09:38:02,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3836760.0, ans=0.0 2023-11-27 09:38:08,643 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-11-27 09:38:11,096 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10400, loss[loss=0.07633, simple_loss=0.1026, pruned_loss=0.01549, audio_tagging_loss=0.009562, over 16617.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08815, pruned_loss=0.01176, audio_tagging_loss=0.008921, over 3039814.12 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:38:17,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3836826.6666666665, ans=0.1 2023-11-27 09:38:38,217 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575550 2023-11-27 09:38:44,374 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:38:49,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 9.167e+01 9.734e+01 1.047e+02 2.020e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-27 09:39:06,813 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10450, loss[loss=0.07892, simple_loss=0.1019, pruned_loss=0.01861, audio_tagging_loss=0.009386, over 16244.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.0878, pruned_loss=0.01158, audio_tagging_loss=0.008901, over 3048207.31 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:39:10,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2023-11-27 09:39:12,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3837160.0, ans=0.125 2023-11-27 09:39:15,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837160.0, ans=0.1 2023-11-27 09:39:16,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-27 09:39:17,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-27 09:39:18,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2023-11-27 09:39:23,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3837226.6666666665, ans=0.09899494936611666 2023-11-27 09:39:33,887 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575600 2023-11-27 09:39:43,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3837360.0, ans=0.125 2023-11-27 09:39:52,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837426.6666666665, ans=0.1 2023-11-27 09:40:03,184 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10500, loss[loss=0.05806, simple_loss=0.08333, pruned_loss=0.009013, audio_tagging_loss=0.007382, over 15162.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08792, pruned_loss=0.01176, audio_tagging_loss=0.008816, over 3049523.50 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:40:16,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837560.0, ans=0.1 2023-11-27 09:40:29,782 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575650 2023-11-27 09:40:41,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.939e+01 9.707e+01 1.019e+02 1.510e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 09:40:44,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3837693.3333333335, ans=0.0 2023-11-27 09:40:48,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3837760.0, ans=0.0 2023-11-27 09:40:55,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3837760.0, ans=0.0 2023-11-27 09:40:58,841 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10550, loss[loss=0.06381, simple_loss=0.09143, pruned_loss=0.01163, audio_tagging_loss=0.006466, over 14957.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08835, pruned_loss=0.01184, audio_tagging_loss=0.008755, over 3048000.93 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:41:17,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-27 09:41:24,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3837960.0, ans=0.125 2023-11-27 09:41:25,799 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575700 2023-11-27 09:41:42,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3838093.3333333335, ans=0.125 2023-11-27 09:41:46,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3838093.3333333335, ans=0.04949747468305833 2023-11-27 09:41:54,062 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10600, loss[loss=0.07551, simple_loss=0.09466, pruned_loss=0.01618, audio_tagging_loss=0.01199, over 15762.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08881, pruned_loss=0.0119, audio_tagging_loss=0.008715, over 3051492.97 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:42:01,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-11-27 09:42:05,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2023-11-27 09:42:09,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3838226.6666666665, ans=0.125 2023-11-27 09:42:20,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575750 2023-11-27 09:42:31,648 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 9.017e+01 9.530e+01 1.017e+02 1.584e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 09:42:31,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3838360.0, ans=0.1 2023-11-27 09:42:49,595 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10650, loss[loss=0.08504, simple_loss=0.1172, pruned_loss=0.02055, audio_tagging_loss=0.005897, over 15625.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08877, pruned_loss=0.01193, audio_tagging_loss=0.008581, over 3046714.67 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:42:54,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-27 09:42:58,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3838493.3333333335, ans=0.0 2023-11-27 09:43:15,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575800 2023-11-27 09:43:25,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3838693.3333333335, ans=0.1 2023-11-27 09:43:37,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3838760.0, ans=0.2 2023-11-27 09:43:44,270 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10700, loss[loss=0.06602, simple_loss=0.08948, pruned_loss=0.01265, audio_tagging_loss=0.008627, over 14497.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08907, pruned_loss=0.01189, audio_tagging_loss=0.008516, over 3047419.68 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:44:11,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575850 2023-11-27 09:44:21,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 9.303e+01 9.868e+01 1.046e+02 1.253e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-27 09:44:26,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3839026.6666666665, ans=0.0 2023-11-27 09:44:35,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3839093.3333333335, ans=0.1 2023-11-27 09:44:36,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3839093.3333333335, ans=0.0 2023-11-27 09:44:39,734 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10750, loss[loss=0.06298, simple_loss=0.09239, pruned_loss=0.009327, audio_tagging_loss=0.007462, over 14720.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.0876, pruned_loss=0.01168, audio_tagging_loss=0.008528, over 3045253.88 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:45:05,856 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575900 2023-11-27 09:45:16,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.88 vs. limit=10.0 2023-11-27 09:45:20,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=12.0 2023-11-27 09:45:22,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3839426.6666666665, ans=0.2 2023-11-27 09:45:29,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-11-27 09:45:34,378 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10800, loss[loss=0.05447, simple_loss=0.07574, pruned_loss=0.007377, audio_tagging_loss=0.009226, over 15174.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08803, pruned_loss=0.01162, audio_tagging_loss=0.00853, over 3049033.08 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:45:57,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3839626.6666666665, ans=0.1 2023-11-27 09:46:00,512 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575950 2023-11-27 09:46:03,852 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:46:07,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3839693.3333333335, ans=0.015 2023-11-27 09:46:11,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 8.958e+01 9.647e+01 1.051e+02 1.313e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 09:46:12,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3839693.3333333335, ans=0.125 2023-11-27 09:46:28,738 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10850, loss[loss=0.07108, simple_loss=0.08732, pruned_loss=0.01942, audio_tagging_loss=0.007999, over 14286.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08988, pruned_loss=0.01195, audio_tagging_loss=0.008474, over 3049163.28 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:46:52,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=22.5 2023-11-27 09:46:55,391 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576000 2023-11-27 09:47:02,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3839960.0, ans=0.0 2023-11-27 09:47:22,942 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:47:26,024 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10900, loss[loss=0.06976, simple_loss=0.1027, pruned_loss=0.0123, audio_tagging_loss=0.006111, over 15448.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08954, pruned_loss=0.01178, audio_tagging_loss=0.008426, over 3050069.82 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:47:46,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-27 09:47:52,299 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576050 2023-11-27 09:47:54,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3840293.3333333335, ans=0.125 2023-11-27 09:48:03,055 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.173e+01 9.584e+01 1.016e+02 1.234e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 09:48:10,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3840426.6666666665, ans=0.0 2023-11-27 09:48:15,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.97 vs. limit=10.0 2023-11-27 09:48:21,488 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10950, loss[loss=0.05034, simple_loss=0.07205, pruned_loss=0.006576, audio_tagging_loss=0.007743, over 15274.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08889, pruned_loss=0.01158, audio_tagging_loss=0.008546, over 3045870.51 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:48:29,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3840493.3333333335, ans=0.1 2023-11-27 09:48:47,728 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576100 2023-11-27 09:49:00,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3840693.3333333335, ans=0.125 2023-11-27 09:49:06,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3840760.0, ans=0.1 2023-11-27 09:49:07,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=3840760.0, ans=0.1 2023-11-27 09:49:15,863 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11000, loss[loss=0.06042, simple_loss=0.08028, pruned_loss=0.01041, audio_tagging_loss=0.009869, over 14866.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08792, pruned_loss=0.01157, audio_tagging_loss=0.008651, over 3045685.57 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:49:16,445 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-27 09:49:24,254 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:49:28,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3840893.3333333335, ans=0.0 2023-11-27 09:49:38,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3840960.0, ans=0.125 2023-11-27 09:49:38,748 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-11-27 09:49:40,631 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-11-27 09:49:42,412 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576150 2023-11-27 09:49:43,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-27 09:49:46,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3840960.0, ans=0.125 2023-11-27 09:49:51,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3841026.6666666665, ans=0.2 2023-11-27 09:49:53,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.866e+01 8.909e+01 9.397e+01 1.014e+02 1.657e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 09:50:06,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2023-11-27 09:50:09,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3841160.0, ans=0.125 2023-11-27 09:50:10,560 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11050, loss[loss=0.07424, simple_loss=0.1028, pruned_loss=0.01704, audio_tagging_loss=0.00579, over 15020.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0892, pruned_loss=0.01193, audio_tagging_loss=0.008634, over 3048563.66 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:50:11,838 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:50:15,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-27 09:50:21,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3841226.6666666665, ans=0.07 2023-11-27 09:50:24,701 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=22.5 2023-11-27 09:50:36,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3841293.3333333335, ans=0.0 2023-11-27 09:50:37,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576200 2023-11-27 09:50:37,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-27 09:50:41,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3841293.3333333335, ans=0.2 2023-11-27 09:50:54,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3841426.6666666665, ans=0.125 2023-11-27 09:51:03,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-27 09:51:05,313 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=22.5 2023-11-27 09:51:05,939 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11100, loss[loss=0.06768, simple_loss=0.08903, pruned_loss=0.01387, audio_tagging_loss=0.009294, over 15775.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08904, pruned_loss=0.01206, audio_tagging_loss=0.008758, over 3060342.65 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:51:14,974 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:51:15,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3841560.0, ans=0.125 2023-11-27 09:51:20,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3841560.0, ans=0.125 2023-11-27 09:51:27,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3841626.6666666665, ans=0.125 2023-11-27 09:51:32,088 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576250 2023-11-27 09:51:33,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3841626.6666666665, ans=0.125 2023-11-27 09:51:44,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.157e+01 9.860e+01 1.054e+02 1.486e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-27 09:52:00,847 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11150, loss[loss=0.06567, simple_loss=0.09233, pruned_loss=0.01143, audio_tagging_loss=0.008078, over 16750.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08899, pruned_loss=0.01208, audio_tagging_loss=0.008777, over 3050565.18 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:52:04,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3841826.6666666665, ans=0.125 2023-11-27 09:52:05,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-11-27 09:52:08,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3841826.6666666665, ans=0.125 2023-11-27 09:52:22,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3841960.0, ans=0.125 2023-11-27 09:52:27,402 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576300 2023-11-27 09:52:29,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3841960.0, ans=0.0 2023-11-27 09:52:46,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3842093.3333333335, ans=0.0 2023-11-27 09:52:55,653 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11200, loss[loss=0.09224, simple_loss=0.1148, pruned_loss=0.02713, audio_tagging_loss=0.007698, over 14990.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08881, pruned_loss=0.01201, audio_tagging_loss=0.00885, over 3047675.66 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:53:03,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3842160.0, ans=0.2 2023-11-27 09:53:22,311 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576350 2023-11-27 09:53:27,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3842360.0, ans=0.0 2023-11-27 09:53:31,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3842360.0, ans=0.125 2023-11-27 09:53:33,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 9.019e+01 9.427e+01 1.023e+02 1.335e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 09:53:35,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3842360.0, ans=0.125 2023-11-27 09:53:50,653 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11250, loss[loss=0.06619, simple_loss=0.08634, pruned_loss=0.01573, audio_tagging_loss=0.007291, over 14573.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08897, pruned_loss=0.01183, audio_tagging_loss=0.00877, over 3051668.10 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:53:58,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3842493.3333333335, ans=0.125 2023-11-27 09:53:59,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-27 09:54:16,349 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576400 2023-11-27 09:54:20,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3842626.6666666665, ans=0.125 2023-11-27 09:54:26,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=12.0 2023-11-27 09:54:37,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3842760.0, ans=0.0 2023-11-27 09:54:45,846 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11300, loss[loss=0.05549, simple_loss=0.07791, pruned_loss=0.01002, audio_tagging_loss=0.006509, over 15068.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08838, pruned_loss=0.01173, audio_tagging_loss=0.008653, over 3051056.37 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:54:48,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3842826.6666666665, ans=0.125 2023-11-27 09:54:48,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3842826.6666666665, ans=0.125 2023-11-27 09:55:01,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3842893.3333333335, ans=0.0 2023-11-27 09:55:11,934 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576450 2023-11-27 09:55:17,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3843026.6666666665, ans=0.0 2023-11-27 09:55:25,397 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 9.044e+01 9.676e+01 1.062e+02 1.427e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 09:55:25,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3843026.6666666665, ans=0.1 2023-11-27 09:55:28,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3843093.3333333335, ans=0.125 2023-11-27 09:55:30,184 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.68 vs. limit=22.5 2023-11-27 09:55:40,052 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11350, loss[loss=0.06511, simple_loss=0.08964, pruned_loss=0.01233, audio_tagging_loss=0.007962, over 16064.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08904, pruned_loss=0.01187, audio_tagging_loss=0.008563, over 3056633.06 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:55:49,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3843160.0, ans=0.035 2023-11-27 09:56:02,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-27 09:56:06,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3843293.3333333335, ans=0.0 2023-11-27 09:56:07,290 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576500 2023-11-27 09:56:22,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3843360.0, ans=0.0 2023-11-27 09:56:23,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-27 09:56:35,321 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11400, loss[loss=0.05496, simple_loss=0.07367, pruned_loss=0.008898, audio_tagging_loss=0.009233, over 16170.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08892, pruned_loss=0.01176, audio_tagging_loss=0.008518, over 3052016.32 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:57:01,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576550 2023-11-27 09:57:01,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3843626.6666666665, ans=0.0 2023-11-27 09:57:12,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3843693.3333333335, ans=0.0 2023-11-27 09:57:15,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 9.108e+01 9.687e+01 1.045e+02 1.288e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 09:57:18,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2023-11-27 09:57:30,705 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11450, loss[loss=0.05466, simple_loss=0.06941, pruned_loss=0.01174, audio_tagging_loss=0.008214, over 14443.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08914, pruned_loss=0.01196, audio_tagging_loss=0.008457, over 3049317.78 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:57:42,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3843893.3333333335, ans=0.125 2023-11-27 09:57:53,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3843960.0, ans=0.0 2023-11-27 09:57:55,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3843960.0, ans=0.07 2023-11-27 09:57:56,235 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576600 2023-11-27 09:57:58,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3843960.0, ans=0.125 2023-11-27 09:57:59,685 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2023-11-27 09:58:02,783 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.67 vs. limit=15.0 2023-11-27 09:58:11,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3844026.6666666665, ans=0.125 2023-11-27 09:58:25,202 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11500, loss[loss=0.05925, simple_loss=0.08823, pruned_loss=0.006105, audio_tagging_loss=0.009029, over 14330.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08872, pruned_loss=0.01189, audio_tagging_loss=0.008491, over 3041926.20 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:58:25,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3844160.0, ans=0.1 2023-11-27 09:58:25,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2023-11-27 09:58:31,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-27 09:58:35,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3844226.6666666665, ans=0.1 2023-11-27 09:58:36,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3844226.6666666665, ans=0.1 2023-11-27 09:58:45,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3844226.6666666665, ans=0.125 2023-11-27 09:58:52,268 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576650 2023-11-27 09:58:58,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3844360.0, ans=0.1 2023-11-27 09:59:05,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.980e+01 9.059e+01 9.598e+01 1.033e+02 1.422e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 09:59:19,829 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11550, loss[loss=0.08917, simple_loss=0.1184, pruned_loss=0.01963, audio_tagging_loss=0.01033, over 15778.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08944, pruned_loss=0.01197, audio_tagging_loss=0.008488, over 3049806.64 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:59:40,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3844560.0, ans=0.2 2023-11-27 09:59:41,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3844626.6666666665, ans=0.125 2023-11-27 09:59:46,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576700 2023-11-27 09:59:54,508 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:00:15,295 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11600, loss[loss=0.06826, simple_loss=0.09439, pruned_loss=0.0141, audio_tagging_loss=0.006963, over 15052.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09016, pruned_loss=0.01203, audio_tagging_loss=0.008438, over 3052915.82 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:00:20,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3844826.6666666665, ans=0.0 2023-11-27 10:00:27,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3844893.3333333335, ans=0.1 2023-11-27 10:00:41,421 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576750 2023-11-27 10:00:43,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-27 10:00:52,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.47 vs. limit=22.5 2023-11-27 10:00:55,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 9.083e+01 9.816e+01 1.051e+02 1.317e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 10:01:08,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3845093.3333333335, ans=0.2 2023-11-27 10:01:09,975 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11650, loss[loss=0.0677, simple_loss=0.09986, pruned_loss=0.00989, audio_tagging_loss=0.007882, over 14502.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08987, pruned_loss=0.01199, audio_tagging_loss=0.008469, over 3050148.61 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:01:21,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3845226.6666666665, ans=0.1 2023-11-27 10:01:35,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3845293.3333333335, ans=0.0 2023-11-27 10:01:36,746 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576800 2023-11-27 10:01:36,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3845293.3333333335, ans=0.0 2023-11-27 10:01:51,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-27 10:01:55,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3845426.6666666665, ans=0.1 2023-11-27 10:02:05,305 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11700, loss[loss=0.06069, simple_loss=0.08707, pruned_loss=0.0086, audio_tagging_loss=0.008556, over 14736.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08888, pruned_loss=0.01195, audio_tagging_loss=0.008566, over 3045207.78 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:02:29,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3845626.6666666665, ans=0.125 2023-11-27 10:02:31,902 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576850 2023-11-27 10:02:32,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-27 10:02:40,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3845693.3333333335, ans=0.07 2023-11-27 10:02:45,919 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.976e+01 8.983e+01 9.560e+01 1.031e+02 1.339e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 10:02:51,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3845760.0, ans=0.0 2023-11-27 10:02:52,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3845760.0, ans=0.2 2023-11-27 10:02:56,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3845760.0, ans=0.1 2023-11-27 10:03:00,652 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11750, loss[loss=0.06994, simple_loss=0.1048, pruned_loss=0.01173, audio_tagging_loss=0.005792, over 15379.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.09004, pruned_loss=0.012, audio_tagging_loss=0.008494, over 3047913.45 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:03:10,281 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-11-27 10:03:26,907 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576900 2023-11-27 10:03:29,280 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:03:41,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2023-11-27 10:03:46,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3846093.3333333335, ans=0.5 2023-11-27 10:03:47,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3846093.3333333335, ans=0.125 2023-11-27 10:03:48,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3846093.3333333335, ans=0.025 2023-11-27 10:03:51,120 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:03:55,765 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11800, loss[loss=0.09622, simple_loss=0.1457, pruned_loss=0.0159, audio_tagging_loss=0.007448, over 15771.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09031, pruned_loss=0.01213, audio_tagging_loss=0.008463, over 3045004.98 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:03:58,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3846160.0, ans=15.0 2023-11-27 10:04:00,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3846160.0, ans=0.125 2023-11-27 10:04:08,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3846226.6666666665, ans=0.125 2023-11-27 10:04:14,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3846226.6666666665, ans=0.125 2023-11-27 10:04:22,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576950 2023-11-27 10:04:34,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3846360.0, ans=0.0 2023-11-27 10:04:36,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.212e+01 9.788e+01 1.058e+02 1.368e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-27 10:04:37,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3846360.0, ans=0.125 2023-11-27 10:04:39,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3846426.6666666665, ans=0.125 2023-11-27 10:04:41,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3846426.6666666665, ans=0.2 2023-11-27 10:04:42,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3846426.6666666665, ans=0.125 2023-11-27 10:04:45,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3846426.6666666665, ans=0.125 2023-11-27 10:04:50,441 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11850, loss[loss=0.06445, simple_loss=0.08674, pruned_loss=0.01206, audio_tagging_loss=0.009021, over 14987.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08895, pruned_loss=0.01189, audio_tagging_loss=0.008589, over 3044843.16 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:04:50,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3846493.3333333335, ans=0.125 2023-11-27 10:04:52,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=22.5 2023-11-27 10:05:04,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3846560.0, ans=0.125 2023-11-27 10:05:07,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-27 10:05:09,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3846560.0, ans=0.125 2023-11-27 10:05:17,057 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577000 2023-11-27 10:05:22,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3846693.3333333335, ans=0.1 2023-11-27 10:05:36,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2023-11-27 10:05:46,120 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11900, loss[loss=0.06396, simple_loss=0.09304, pruned_loss=0.009169, audio_tagging_loss=0.008267, over 16313.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08932, pruned_loss=0.01186, audio_tagging_loss=0.008606, over 3047608.41 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:05:54,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3846826.6666666665, ans=0.125 2023-11-27 10:05:59,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-27 10:06:12,358 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577050 2023-11-27 10:06:14,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3846960.0, ans=0.1 2023-11-27 10:06:26,328 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.851e+01 8.935e+01 9.684e+01 1.048e+02 1.256e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 10:06:31,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3847093.3333333335, ans=0.1 2023-11-27 10:06:35,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3847093.3333333335, ans=0.2 2023-11-27 10:06:40,518 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11950, loss[loss=0.06386, simple_loss=0.07937, pruned_loss=0.01193, audio_tagging_loss=0.01224, over 14730.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08879, pruned_loss=0.01174, audio_tagging_loss=0.008764, over 3050570.39 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:06:43,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3847160.0, ans=15.0 2023-11-27 10:06:49,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3847160.0, ans=0.2 2023-11-27 10:07:05,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3847293.3333333335, ans=0.05 2023-11-27 10:07:07,305 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577100 2023-11-27 10:07:10,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3847293.3333333335, ans=0.1 2023-11-27 10:07:13,369 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-11-27 10:07:16,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3847360.0, ans=0.0 2023-11-27 10:07:16,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3847360.0, ans=0.2 2023-11-27 10:07:19,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3847360.0, ans=0.0 2023-11-27 10:07:20,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2023-11-27 10:07:33,994 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 12000, loss[loss=0.05872, simple_loss=0.08175, pruned_loss=0.007816, audio_tagging_loss=0.01003, over 15143.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08891, pruned_loss=0.01184, audio_tagging_loss=0.008871, over 3048064.09 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 10:07:33,995 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 10:08:06,008 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05797, simple_loss=0.05046, pruned_loss=0.005369, audio_tagging_loss=0.02737, over 4681554.00 frames. 2023-11-27 10:08:06,009 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 10:08:13,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3847493.3333333335, ans=0.95 2023-11-27 10:08:27,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3847626.6666666665, ans=0.125 2023-11-27 10:08:58,285 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 0, loss[loss=0.08905, simple_loss=0.1122, pruned_loss=0.01795, audio_tagging_loss=0.01498, over 16162.00 frames. ], tot_loss[loss=0.08905, simple_loss=0.1122, pruned_loss=0.01795, audio_tagging_loss=0.01498, over 16162.00 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:08:58,286 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 10:09:29,265 INFO [train_asr.py:1267] (2/4) Epoch 49, validation: loss=0.05781, simple_loss=0.05038, pruned_loss=0.005301, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-27 10:09:29,266 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 10:09:29,328 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577150 2023-11-27 10:09:29,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3847653.3333333335, ans=0.09899494936611666 2023-11-27 10:09:42,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3847720.0, ans=0.0 2023-11-27 10:09:42,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.973e+01 9.407e+01 1.008e+02 1.108e+02 1.423e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-27 10:09:49,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3847786.6666666665, ans=0.1 2023-11-27 10:10:07,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3847853.3333333335, ans=0.125 2023-11-27 10:10:21,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3847920.0, ans=0.125 2023-11-27 10:10:23,743 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 50, loss[loss=0.07154, simple_loss=0.08332, pruned_loss=0.01263, audio_tagging_loss=0.01725, over 15391.00 frames. ], tot_loss[loss=0.07386, simple_loss=0.09186, pruned_loss=0.01193, audio_tagging_loss=0.016, over 689478.70 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:10:23,808 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577200 2023-11-27 10:10:40,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-27 10:11:04,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3848186.6666666665, ans=0.09899494936611666 2023-11-27 10:11:07,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3848253.3333333335, ans=0.125 2023-11-27 10:11:19,481 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 100, loss[loss=0.06461, simple_loss=0.077, pruned_loss=0.01024, audio_tagging_loss=0.01587, over 13426.00 frames. ], tot_loss[loss=0.07126, simple_loss=0.08778, pruned_loss=0.01159, audio_tagging_loss=0.01578, over 1203033.65 frames. ], batch size: 52, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:11:19,545 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577250 2023-11-27 10:11:23,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3848320.0, ans=0.125 2023-11-27 10:11:32,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3848386.6666666665, ans=0.125 2023-11-27 10:11:34,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 9.835e+01 1.039e+02 1.086e+02 1.551e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-27 10:11:57,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3848520.0, ans=0.0 2023-11-27 10:11:58,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3848520.0, ans=0.0 2023-11-27 10:12:14,672 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 150, loss[loss=0.09642, simple_loss=0.1339, pruned_loss=0.02027, audio_tagging_loss=0.009207, over 15079.00 frames. ], tot_loss[loss=0.0698, simple_loss=0.0878, pruned_loss=0.01173, audio_tagging_loss=0.01417, over 1611842.66 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:12:14,741 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577300 2023-11-27 10:12:22,251 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:13:09,313 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 200, loss[loss=0.05632, simple_loss=0.07841, pruned_loss=0.009309, audio_tagging_loss=0.007806, over 15212.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.08886, pruned_loss=0.01174, audio_tagging_loss=0.01251, over 1928065.35 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:13:09,385 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577350 2023-11-27 10:13:22,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-27 10:13:25,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.806e+01 9.137e+01 9.838e+01 1.045e+02 1.312e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 10:13:40,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3849120.0, ans=0.2 2023-11-27 10:13:46,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3849186.6666666665, ans=0.125 2023-11-27 10:13:58,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3849253.3333333335, ans=0.0 2023-11-27 10:14:04,775 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 250, loss[loss=0.09237, simple_loss=0.1242, pruned_loss=0.0229, audio_tagging_loss=0.007392, over 15683.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.08925, pruned_loss=0.01183, audio_tagging_loss=0.01136, over 2175168.29 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:14:04,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577400 2023-11-27 10:14:07,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3849320.0, ans=0.125 2023-11-27 10:14:28,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849453.3333333335, ans=0.1 2023-11-27 10:14:35,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3849453.3333333335, ans=0.125 2023-11-27 10:14:55,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=12.0 2023-11-27 10:14:59,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3849653.3333333335, ans=0.125 2023-11-27 10:15:00,634 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 300, loss[loss=0.07189, simple_loss=0.103, pruned_loss=0.01414, audio_tagging_loss=0.00627, over 14584.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.08988, pruned_loss=0.01195, audio_tagging_loss=0.01044, over 2369744.95 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:15:00,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577450 2023-11-27 10:15:15,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 9.252e+01 9.785e+01 1.052e+02 1.385e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-27 10:15:29,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849786.6666666665, ans=0.1 2023-11-27 10:15:38,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3849853.3333333335, ans=0.0 2023-11-27 10:15:43,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3849920.0, ans=0.95 2023-11-27 10:15:50,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3849920.0, ans=0.05 2023-11-27 10:15:55,357 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 350, loss[loss=0.08468, simple_loss=0.118, pruned_loss=0.01772, audio_tagging_loss=0.007939, over 15497.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09026, pruned_loss=0.01206, audio_tagging_loss=0.009859, over 2520243.94 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:15:55,428 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577500 2023-11-27 10:16:12,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=3850053.3333333335, ans=12.0 2023-11-27 10:16:47,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2023-11-27 10:16:50,559 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 400, loss[loss=0.07484, simple_loss=0.1031, pruned_loss=0.01622, audio_tagging_loss=0.007077, over 15774.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08968, pruned_loss=0.01214, audio_tagging_loss=0.009497, over 2634752.67 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:16:50,631 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577550 2023-11-27 10:17:03,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2023-11-27 10:17:03,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=22.5 2023-11-27 10:17:07,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.990e+01 9.603e+01 1.040e+02 1.304e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 10:17:10,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3850386.6666666665, ans=0.125 2023-11-27 10:17:22,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3850520.0, ans=0.0 2023-11-27 10:17:34,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3850586.6666666665, ans=0.125 2023-11-27 10:17:40,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3850586.6666666665, ans=0.0 2023-11-27 10:17:46,058 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 450, loss[loss=0.06126, simple_loss=0.08758, pruned_loss=0.009525, audio_tagging_loss=0.007944, over 14793.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08898, pruned_loss=0.01204, audio_tagging_loss=0.009273, over 2722229.90 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:17:46,127 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577600 2023-11-27 10:17:46,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3850653.3333333335, ans=0.1 2023-11-27 10:17:48,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3850653.3333333335, ans=0.2 2023-11-27 10:18:20,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3850853.3333333335, ans=0.0 2023-11-27 10:18:39,696 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:18:40,538 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 500, loss[loss=0.09043, simple_loss=0.12, pruned_loss=0.02333, audio_tagging_loss=0.007106, over 14595.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09018, pruned_loss=0.01236, audio_tagging_loss=0.009052, over 2796520.79 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:18:40,603 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577650 2023-11-27 10:18:40,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3850986.6666666665, ans=0.1 2023-11-27 10:18:41,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3850986.6666666665, ans=0.0 2023-11-27 10:18:56,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 9.066e+01 9.728e+01 1.042e+02 1.279e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 10:18:56,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3851053.3333333335, ans=0.1 2023-11-27 10:19:11,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3851120.0, ans=0.125 2023-11-27 10:19:15,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3851186.6666666665, ans=0.0 2023-11-27 10:19:24,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3851253.3333333335, ans=10.0 2023-11-27 10:19:35,126 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 550, loss[loss=0.06432, simple_loss=0.08008, pruned_loss=0.01399, audio_tagging_loss=0.01029, over 16244.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08978, pruned_loss=0.01233, audio_tagging_loss=0.008974, over 2850881.69 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:19:35,203 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577700 2023-11-27 10:20:30,425 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 600, loss[loss=0.08062, simple_loss=0.1204, pruned_loss=0.01348, audio_tagging_loss=0.006962, over 15094.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08919, pruned_loss=0.01221, audio_tagging_loss=0.008941, over 2889915.82 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:20:30,494 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577750 2023-11-27 10:20:38,314 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.66 vs. limit=10.0 2023-11-27 10:20:40,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3851653.3333333335, ans=0.1 2023-11-27 10:20:45,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3851720.0, ans=0.07 2023-11-27 10:20:47,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 9.017e+01 9.537e+01 1.031e+02 1.710e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 10:20:47,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3851720.0, ans=0.09899494936611666 2023-11-27 10:20:47,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:47,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:50,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:54,787 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:21:03,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3851853.3333333335, ans=0.125 2023-11-27 10:21:25,988 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 650, loss[loss=0.09133, simple_loss=0.123, pruned_loss=0.02285, audio_tagging_loss=0.006991, over 15694.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08927, pruned_loss=0.01215, audio_tagging_loss=0.008865, over 2923499.12 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:21:26,061 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577800 2023-11-27 10:21:27,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3851986.6666666665, ans=0.125 2023-11-27 10:21:28,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3851986.6666666665, ans=0.0 2023-11-27 10:21:33,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3851986.6666666665, ans=0.0 2023-11-27 10:21:34,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3851986.6666666665, ans=0.07 2023-11-27 10:21:37,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=22.5 2023-11-27 10:21:47,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3852120.0, ans=0.125 2023-11-27 10:21:48,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3852120.0, ans=0.125 2023-11-27 10:22:03,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3852186.6666666665, ans=0.1 2023-11-27 10:22:16,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3852253.3333333335, ans=0.0 2023-11-27 10:22:20,663 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 700, loss[loss=0.0753, simple_loss=0.09661, pruned_loss=0.02016, audio_tagging_loss=0.006831, over 14274.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.089, pruned_loss=0.01218, audio_tagging_loss=0.008814, over 2950721.82 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:22:20,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577850 2023-11-27 10:22:21,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3852320.0, ans=0.125 2023-11-27 10:22:37,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 9.117e+01 9.739e+01 1.041e+02 1.243e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 10:22:47,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3852453.3333333335, ans=0.125 2023-11-27 10:23:16,083 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 750, loss[loss=0.08866, simple_loss=0.114, pruned_loss=0.02237, audio_tagging_loss=0.009274, over 14691.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08954, pruned_loss=0.01221, audio_tagging_loss=0.008705, over 2970624.70 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:23:16,151 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577900 2023-11-27 10:23:20,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3852653.3333333335, ans=0.0 2023-11-27 10:23:36,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3852786.6666666665, ans=0.2 2023-11-27 10:23:55,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3852853.3333333335, ans=0.2 2023-11-27 10:23:57,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3852853.3333333335, ans=0.125 2023-11-27 10:24:05,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2023-11-27 10:24:11,159 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 800, loss[loss=0.06984, simple_loss=0.09254, pruned_loss=0.01375, audio_tagging_loss=0.009826, over 15143.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08928, pruned_loss=0.01205, audio_tagging_loss=0.008757, over 2989975.05 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:24:11,233 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577950 2023-11-27 10:24:26,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.085e+01 9.807e+01 1.032e+02 1.313e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-27 10:24:28,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2023-11-27 10:24:34,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3853120.0, ans=0.0 2023-11-27 10:24:34,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3853120.0, ans=0.09899494936611666 2023-11-27 10:24:34,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3853120.0, ans=0.09899494936611666 2023-11-27 10:24:36,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3853120.0, ans=0.2 2023-11-27 10:24:45,256 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:25:05,439 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 850, loss[loss=0.0613, simple_loss=0.08241, pruned_loss=0.009528, audio_tagging_loss=0.01056, over 14335.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08898, pruned_loss=0.012, audio_tagging_loss=0.008862, over 2995987.84 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:25:05,525 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578000 2023-11-27 10:25:23,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3853386.6666666665, ans=0.2 2023-11-27 10:25:31,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3853453.3333333335, ans=0.125 2023-11-27 10:25:40,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3853520.0, ans=0.0 2023-11-27 10:25:54,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3853586.6666666665, ans=0.0 2023-11-27 10:26:01,083 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 900, loss[loss=0.06008, simple_loss=0.0825, pruned_loss=0.009708, audio_tagging_loss=0.009125, over 15093.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08905, pruned_loss=0.01195, audio_tagging_loss=0.008885, over 3002782.76 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:26:01,152 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578050 2023-11-27 10:26:09,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3853653.3333333335, ans=0.0 2023-11-27 10:26:14,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3853720.0, ans=0.5 2023-11-27 10:26:19,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 9.242e+01 9.846e+01 1.086e+02 1.686e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 10:26:20,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3853720.0, ans=0.1 2023-11-27 10:26:25,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3853786.6666666665, ans=0.0 2023-11-27 10:26:52,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3853920.0, ans=0.125 2023-11-27 10:26:52,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3853920.0, ans=0.1 2023-11-27 10:26:53,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3853920.0, ans=0.05 2023-11-27 10:26:56,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-11-27 10:26:56,609 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 950, loss[loss=0.05449, simple_loss=0.06847, pruned_loss=0.01249, audio_tagging_loss=0.00777, over 15717.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08861, pruned_loss=0.01185, audio_tagging_loss=0.008831, over 3009382.36 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:26:56,685 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578100 2023-11-27 10:27:25,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3854120.0, ans=0.0 2023-11-27 10:27:32,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3854186.6666666665, ans=0.2 2023-11-27 10:27:33,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3854186.6666666665, ans=0.0 2023-11-27 10:27:37,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3854186.6666666665, ans=0.1 2023-11-27 10:27:45,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3854253.3333333335, ans=0.125 2023-11-27 10:27:51,634 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1000, loss[loss=0.06684, simple_loss=0.09729, pruned_loss=0.01166, audio_tagging_loss=0.006537, over 14165.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08847, pruned_loss=0.01186, audio_tagging_loss=0.008679, over 3016059.93 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:27:51,705 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578150 2023-11-27 10:28:06,140 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-27 10:28:09,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 9.145e+01 9.757e+01 1.033e+02 1.378e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 10:28:10,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3854386.6666666665, ans=0.125 2023-11-27 10:28:15,058 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:28:18,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3854453.3333333335, ans=0.125 2023-11-27 10:28:46,183 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1050, loss[loss=0.06213, simple_loss=0.08694, pruned_loss=0.008837, audio_tagging_loss=0.009826, over 15845.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08869, pruned_loss=0.01186, audio_tagging_loss=0.008592, over 3027152.56 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:28:46,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578200 2023-11-27 10:28:46,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854653.3333333335, ans=0.1 2023-11-27 10:28:46,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3854653.3333333335, ans=0.0 2023-11-27 10:29:00,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3854720.0, ans=0.125 2023-11-27 10:29:00,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3854720.0, ans=0.125 2023-11-27 10:29:03,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854720.0, ans=0.1 2023-11-27 10:29:11,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2023-11-27 10:29:37,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3854920.0, ans=0.5 2023-11-27 10:29:41,663 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1100, loss[loss=0.07619, simple_loss=0.1035, pruned_loss=0.0192, audio_tagging_loss=0.005247, over 14838.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08938, pruned_loss=0.01195, audio_tagging_loss=0.008527, over 3033834.79 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:29:41,731 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578250 2023-11-27 10:29:43,841 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:29:48,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3854986.6666666665, ans=10.0 2023-11-27 10:29:54,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-11-27 10:29:58,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3855053.3333333335, ans=0.2 2023-11-27 10:29:58,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 8.982e+01 9.681e+01 1.049e+02 1.414e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 10:30:01,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2023-11-27 10:30:07,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3855120.0, ans=0.035 2023-11-27 10:30:07,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2023-11-27 10:30:18,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3855186.6666666665, ans=0.125 2023-11-27 10:30:34,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3855253.3333333335, ans=0.125 2023-11-27 10:30:36,809 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1150, loss[loss=0.06491, simple_loss=0.09014, pruned_loss=0.008364, audio_tagging_loss=0.01147, over 16410.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08931, pruned_loss=0.01186, audio_tagging_loss=0.008412, over 3033174.92 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:30:36,882 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578300 2023-11-27 10:30:43,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-27 10:30:52,302 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2023-11-27 10:31:21,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3855586.6666666665, ans=0.0 2023-11-27 10:31:24,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3855586.6666666665, ans=0.05 2023-11-27 10:31:31,593 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1200, loss[loss=0.06803, simple_loss=0.09248, pruned_loss=0.01167, audio_tagging_loss=0.01011, over 14214.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08922, pruned_loss=0.01183, audio_tagging_loss=0.008352, over 3036080.99 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:31:31,658 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578350 2023-11-27 10:31:46,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3855720.0, ans=0.1 2023-11-27 10:31:49,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 9.092e+01 9.675e+01 1.031e+02 1.166e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 10:31:51,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3855720.0, ans=0.0 2023-11-27 10:31:51,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3855720.0, ans=0.125 2023-11-27 10:31:58,526 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-27 10:32:05,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3855853.3333333335, ans=0.09899494936611666 2023-11-27 10:32:12,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3855853.3333333335, ans=0.025 2023-11-27 10:32:27,224 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1250, loss[loss=0.07644, simple_loss=0.1073, pruned_loss=0.01473, audio_tagging_loss=0.008038, over 16701.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08921, pruned_loss=0.01185, audio_tagging_loss=0.008406, over 3040548.90 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:32:27,292 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578400 2023-11-27 10:32:27,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3855986.6666666665, ans=10.0 2023-11-27 10:32:29,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3855986.6666666665, ans=0.125 2023-11-27 10:32:35,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3855986.6666666665, ans=0.0 2023-11-27 10:32:51,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3856120.0, ans=0.125 2023-11-27 10:32:56,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3856120.0, ans=0.0 2023-11-27 10:33:02,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-27 10:33:10,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-27 10:33:12,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3856253.3333333335, ans=0.0 2023-11-27 10:33:15,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-27 10:33:17,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-27 10:33:21,802 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1300, loss[loss=0.05406, simple_loss=0.07808, pruned_loss=0.0052, audio_tagging_loss=0.009818, over 16234.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08884, pruned_loss=0.01179, audio_tagging_loss=0.008468, over 3041305.36 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:33:21,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578450 2023-11-27 10:33:23,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-27 10:33:26,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3856320.0, ans=0.0 2023-11-27 10:33:35,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3856386.6666666665, ans=0.125 2023-11-27 10:33:39,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 9.062e+01 9.714e+01 1.030e+02 1.237e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 10:33:41,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3856386.6666666665, ans=0.125 2023-11-27 10:33:49,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3856453.3333333335, ans=0.125 2023-11-27 10:33:59,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2023-11-27 10:34:17,149 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1350, loss[loss=0.04757, simple_loss=0.07196, pruned_loss=0.00354, audio_tagging_loss=0.008053, over 15987.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.09017, pruned_loss=0.01206, audio_tagging_loss=0.008359, over 3047000.96 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:34:17,217 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578500 2023-11-27 10:34:18,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-11-27 10:34:45,702 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:34:48,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3856853.3333333335, ans=0.0 2023-11-27 10:34:51,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3856853.3333333335, ans=0.07 2023-11-27 10:34:55,371 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:35:07,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3856920.0, ans=0.0 2023-11-27 10:35:12,559 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1400, loss[loss=0.0597, simple_loss=0.07217, pruned_loss=0.01175, audio_tagging_loss=0.01187, over 15156.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09041, pruned_loss=0.01233, audio_tagging_loss=0.008444, over 3046183.27 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:35:12,627 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578550 2023-11-27 10:35:30,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 9.274e+01 9.843e+01 1.071e+02 1.343e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 10:35:30,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3857053.3333333335, ans=0.2 2023-11-27 10:35:41,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3857120.0, ans=0.125 2023-11-27 10:36:07,246 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1450, loss[loss=0.04548, simple_loss=0.05651, pruned_loss=0.008083, audio_tagging_loss=0.009144, over 15012.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.0902, pruned_loss=0.01224, audio_tagging_loss=0.008567, over 3049442.74 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:36:07,316 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578600 2023-11-27 10:36:38,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3857453.3333333335, ans=0.0 2023-11-27 10:36:51,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3857586.6666666665, ans=0.125 2023-11-27 10:36:57,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3857586.6666666665, ans=0.0 2023-11-27 10:37:01,661 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-27 10:37:02,050 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1500, loss[loss=0.05859, simple_loss=0.08052, pruned_loss=0.01047, audio_tagging_loss=0.007864, over 16687.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08971, pruned_loss=0.01212, audio_tagging_loss=0.00863, over 3048754.35 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:37:02,122 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578650 2023-11-27 10:37:09,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3857653.3333333335, ans=0.125 2023-11-27 10:37:10,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-11-27 10:37:14,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3857720.0, ans=0.125 2023-11-27 10:37:21,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.188e+01 9.715e+01 1.038e+02 1.214e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 10:37:29,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3857786.6666666665, ans=0.1 2023-11-27 10:37:33,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2023-11-27 10:37:39,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-11-27 10:37:57,649 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1550, loss[loss=0.06607, simple_loss=0.0953, pruned_loss=0.009773, audio_tagging_loss=0.008644, over 15943.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08947, pruned_loss=0.012, audio_tagging_loss=0.008718, over 3052656.78 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:37:57,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578700 2023-11-27 10:38:13,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3858053.3333333335, ans=0.125 2023-11-27 10:38:26,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3858120.0, ans=0.2 2023-11-27 10:38:29,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3858186.6666666665, ans=0.2 2023-11-27 10:38:44,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3858253.3333333335, ans=0.125 2023-11-27 10:38:52,547 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1600, loss[loss=0.04939, simple_loss=0.06197, pruned_loss=0.008656, audio_tagging_loss=0.009756, over 15525.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08999, pruned_loss=0.01206, audio_tagging_loss=0.00872, over 3049881.35 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:38:52,619 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578750 2023-11-27 10:39:00,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.82 vs. limit=10.0 2023-11-27 10:39:10,773 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 9.050e+01 9.679e+01 1.052e+02 1.346e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 10:39:25,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2023-11-27 10:39:40,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3858586.6666666665, ans=0.125 2023-11-27 10:39:46,752 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1650, loss[loss=0.06371, simple_loss=0.09315, pruned_loss=0.00955, audio_tagging_loss=0.007584, over 15318.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08898, pruned_loss=0.01183, audio_tagging_loss=0.008775, over 3047140.54 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:39:46,817 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578800 2023-11-27 10:40:02,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3858720.0, ans=0.125 2023-11-27 10:40:14,894 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-11-27 10:40:29,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3858853.3333333335, ans=0.125 2023-11-27 10:40:35,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3858920.0, ans=0.2 2023-11-27 10:40:36,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3858920.0, ans=0.0 2023-11-27 10:40:43,182 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1700, loss[loss=0.05323, simple_loss=0.07945, pruned_loss=0.005861, audio_tagging_loss=0.007642, over 15149.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.089, pruned_loss=0.01184, audio_tagging_loss=0.008777, over 3048204.98 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:40:43,250 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578850 2023-11-27 10:40:46,543 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.67 vs. limit=15.0 2023-11-27 10:40:49,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3858986.6666666665, ans=0.125 2023-11-27 10:40:49,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2023-11-27 10:40:53,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3859053.3333333335, ans=0.5 2023-11-27 10:40:58,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3859053.3333333335, ans=0.2 2023-11-27 10:41:02,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.167e+01 9.822e+01 1.054e+02 1.344e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-27 10:41:20,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3859186.6666666665, ans=0.025 2023-11-27 10:41:38,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-27 10:41:38,513 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1750, loss[loss=0.07597, simple_loss=0.1081, pruned_loss=0.01606, audio_tagging_loss=0.005861, over 15567.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08983, pruned_loss=0.01186, audio_tagging_loss=0.008673, over 3051851.15 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:41:38,585 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578900 2023-11-27 10:41:53,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=22.5 2023-11-27 10:41:54,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=22.5 2023-11-27 10:42:12,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3859520.0, ans=0.0 2023-11-27 10:42:14,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3859520.0, ans=0.125 2023-11-27 10:42:18,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2023-11-27 10:42:32,884 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1800, loss[loss=0.05786, simple_loss=0.07978, pruned_loss=0.008985, audio_tagging_loss=0.008985, over 15698.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08949, pruned_loss=0.01185, audio_tagging_loss=0.008622, over 3053921.67 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:42:32,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578950 2023-11-27 10:42:37,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3859653.3333333335, ans=0.125 2023-11-27 10:42:45,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-27 10:42:53,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.995e+01 9.639e+01 1.040e+02 1.222e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 10:42:57,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3859786.6666666665, ans=0.1 2023-11-27 10:43:05,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3859853.3333333335, ans=0.125 2023-11-27 10:43:07,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3859853.3333333335, ans=0.0 2023-11-27 10:43:12,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2023-11-27 10:43:22,482 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-27 10:43:26,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3859920.0, ans=0.125 2023-11-27 10:43:27,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3859986.6666666665, ans=0.0 2023-11-27 10:43:28,256 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1850, loss[loss=0.07679, simple_loss=0.1095, pruned_loss=0.01562, audio_tagging_loss=0.006397, over 15416.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08984, pruned_loss=0.01194, audio_tagging_loss=0.0085, over 3056748.30 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:43:28,319 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579000 2023-11-27 10:43:50,935 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:44:23,732 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1900, loss[loss=0.07562, simple_loss=0.1088, pruned_loss=0.01561, audio_tagging_loss=0.005638, over 16074.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08983, pruned_loss=0.01187, audio_tagging_loss=0.008447, over 3062302.68 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 8.0 2023-11-27 10:44:23,797 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579050 2023-11-27 10:44:33,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=12.0 2023-11-27 10:44:43,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3860386.6666666665, ans=0.125 2023-11-27 10:44:43,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-11-27 10:44:44,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 9.131e+01 9.734e+01 1.046e+02 1.295e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 10:44:55,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3860520.0, ans=0.0 2023-11-27 10:45:06,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3860520.0, ans=0.0 2023-11-27 10:45:18,683 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1950, loss[loss=0.05182, simple_loss=0.07067, pruned_loss=0.006338, audio_tagging_loss=0.01015, over 15245.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08975, pruned_loss=0.0118, audio_tagging_loss=0.008362, over 3061084.33 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 8.0 2023-11-27 10:45:18,757 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579100 2023-11-27 10:45:19,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3860653.3333333335, ans=0.125 2023-11-27 10:45:40,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3860786.6666666665, ans=0.2 2023-11-27 10:45:40,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-27 10:45:50,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3860786.6666666665, ans=0.1 2023-11-27 10:46:13,736 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2000, loss[loss=0.07272, simple_loss=0.09518, pruned_loss=0.01656, audio_tagging_loss=0.00857, over 16624.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08932, pruned_loss=0.01167, audio_tagging_loss=0.00844, over 3054109.14 frames. ], batch size: 64, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:46:13,811 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579150 2023-11-27 10:46:18,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3860986.6666666665, ans=0.0 2023-11-27 10:46:32,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3861053.3333333335, ans=0.2 2023-11-27 10:46:35,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 8.839e+01 9.475e+01 1.022e+02 1.680e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 10:46:41,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3861120.0, ans=0.1 2023-11-27 10:46:51,104 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:46:55,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3861186.6666666665, ans=0.0 2023-11-27 10:46:55,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-27 10:46:56,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3861186.6666666665, ans=0.1 2023-11-27 10:47:05,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3861253.3333333335, ans=0.125 2023-11-27 10:47:10,263 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2050, loss[loss=0.05349, simple_loss=0.07252, pruned_loss=0.00939, audio_tagging_loss=0.007838, over 15409.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08868, pruned_loss=0.01157, audio_tagging_loss=0.008451, over 3044711.11 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:47:10,336 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579200 2023-11-27 10:48:00,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3861586.6666666665, ans=0.125 2023-11-27 10:48:07,618 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2100, loss[loss=0.07044, simple_loss=0.104, pruned_loss=0.01198, audio_tagging_loss=0.006481, over 14842.00 frames. ], tot_loss[loss=0.0641, simple_loss=0.0883, pruned_loss=0.01149, audio_tagging_loss=0.008459, over 3042358.96 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:48:07,691 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579250 2023-11-27 10:48:09,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2023-11-27 10:48:14,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2023-11-27 10:48:24,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=12.0 2023-11-27 10:48:28,807 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.950e+01 9.629e+01 1.055e+02 1.441e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 10:48:30,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3861786.6666666665, ans=10.0 2023-11-27 10:48:57,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3861920.0, ans=0.125 2023-11-27 10:49:02,723 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2023-11-27 10:49:03,226 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2150, loss[loss=0.05615, simple_loss=0.08119, pruned_loss=0.007658, audio_tagging_loss=0.007903, over 15858.00 frames. ], tot_loss[loss=0.06394, simple_loss=0.08763, pruned_loss=0.01159, audio_tagging_loss=0.008534, over 3046021.52 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:49:03,300 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579300 2023-11-27 10:49:05,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3861986.6666666665, ans=0.0 2023-11-27 10:49:12,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3861986.6666666665, ans=0.2 2023-11-27 10:49:13,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3861986.6666666665, ans=10.0 2023-11-27 10:49:22,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3862053.3333333335, ans=0.09899494936611666 2023-11-27 10:49:35,560 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:49:41,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3862186.6666666665, ans=0.0 2023-11-27 10:49:49,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862253.3333333335, ans=0.1 2023-11-27 10:49:59,853 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2200, loss[loss=0.05125, simple_loss=0.0674, pruned_loss=0.009053, audio_tagging_loss=0.008498, over 14523.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08909, pruned_loss=0.01184, audio_tagging_loss=0.008439, over 3050437.01 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:49:59,983 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579350 2023-11-27 10:50:08,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3862320.0, ans=0.0